Programming project¶

Preamble¶

Names: Daniel Steinle
Matrikelnummer: 12206327

DISCLAIMER:
I initialy wanted to make this group project with Felix Gruber, and did the full 01 to 04 part (was not a good idea). After reading in the data for him, we agreed on him starting to examine the data. We wanted to meet on Tuesday but he told me he couldn't come. When asking him after some days on last Friday how far he's come, I received no answer until Saturday evening, him saying he hasn't done anything so far.
Since at that time I arleady started to do the project part myself to the fullest as well, I decided to upload my programming project on my one, since I have already done everything of it and find it unfair to gift it to someone who did not show any initiative themselves to contribute to the project.
I'm unsure if he'll upload my solution from part 01-04 as his own now though. The state of the read in files included 500 days data of the weather stations of Augsburg and Kempten.

I know that this supposed to be a group project and my uploading may not suffice this criteria. Tough I can not upload this for someone who did not participated in it at all.

01 - An energy balance model with hysteresis¶

The planetary albedo $\alpha$ is in fact changing with climate change. As the temperature drops, sea-ice and ice sheets are extending (increasing the albedo). Inversely, the albedo decreases as temperature rises. The planetary albedo of our simple energy balance model follows the following equation:

$$ \alpha = \begin{cases} 0.3,& \text{if } T \gt 280\\ 0.7,& \text{if } T \lt 250\\ a T + b, & \text{otherwise} \end{cases} $$

01-01: Compute the parameters $a$ and $b$ so that the equation is continuous at T=250K and T=280K.¶

To evaluate $a$ and $b$ one has to solve a system consisting of two linear equations.

$$ a \ (250) = 0.7 \\ a \ (280) = 0.3 $$

After simple calculations a and b can be determined as

$$ a = - {1 \over 75} \\ b = {121 \over 30} $$

This parametrisation results in the following straight:

import numpy as np
import matplotlib.pyplot as plt

# Configuring the data:
a = - 1 / 75
b = 121 / 30
x = np.linspace(250, 280, 20)
y = a * x + b

# Creating the Plot:
fig, ax = plt.subplots()
ax.vlines(250, .27, .72, color="lightgrey")
ax.vlines(280, .27, .72, color="lightgrey")
ax.plot([230, 250, 265 , 280, 300], [.7, .7, np.nan, .3, .3], label="constant albedo")
ax.plot(x, a * x + b, color="red", label="linear connection")

# Some Plotting beautifying
ax.set_ylim(.27, .72)
ax.set_xlabel("Temperature (K)")
ax.set_ylabel("Albedo")
ax.set_title("Temperature dependence of earth's albedo")
ax.legend();

No description has been provided for this image

01-02: Now write a function called `alpha_from_temperature` which accepts a single positional parameter `T` as input (a scalar) and returns the corresponding albedo. Test your function using doctests to make sure that it complies to the instructions above.¶

def alpha_from_temperature(t):
    """
    Approximates earth's albedo for a given temperature in K
    
    Parameters
    ----------
    t : array-like
        temperature in Kelvin
        
    Returns
    -------
    albedo : ndarray
        earth's albedo (untítless)
        
    Examples
    --------
    >>> alpha_from_temperature(250)
    array(0.7)
    >>> alpha_from_temperature(300)
    array(0.3)
    >>> from numpy.testing import assert_allclose
    >>> albedo = alpha_from_temperature(np.arange(250, 290, 10))
    >>> assert_allclose(albedo, np.array([0.7, 0.56666667, 0.43333333, 0.3]), rtol=0, atol=1e-7)
    """
    a = - 1 / 75
    b = 121 / 30
    albedo = np.where(t < 250, .7, np.where(t > 280, .3, a * t + b))
    return albedo

import doctest
doctest.testmod()

TestResults(failed=0, attempted=5)

01-03: Adapt the existing code from week 07 to write a function called `temperature_change_with_hysteresis` which accepts `t0` (the starting temperature in K), `n_years` (the number of simulation years) as positional arguments and `tau` (the atmosphere transmissivity) as keyword argument (default value 0.611). Verify that:¶

the stabilization temperature with t0 = 292 and default tau is approximately 288K
the stabilization temperature with t0 = 265 and default tau is approximately 233K

# From the seventh assignment - just cutted the doctest so the code get's more lucid
def asr(alpha=.3):
    """Absorbed Solar Radiation (W m-2).

    Parameters
    ----------
    alpha : float, optional
        the planetary albedo

    Returns
    -------
    the asr (float)
    
    """
    s0 = 1362
    return (1 - alpha) * s0 / 4


def olr(t, tau=.611):
    """Outgoing Longwave Radiation (W m-2).

    Parameters
    ----------
    t : float
        the atmosphere temperature (K)
    tau : float, optional
        the atmosphere transmissivity (-)

    Returns
    -------
    the olr (float)
    
    """
    sigma = 5.67e-8
    return sigma * tau * t ** 4


# From here on, this was adapted from the original temperature_change function
def temperature_change_with_hysteresis(t0, n_years, tau=.611):
    """Temperature change scenario after change of transmissivity.

    Parameters
    ----------
    t0 : float
        the starting temperature (K)
    n_years : int
        the number of simulation years
    tau : float, optional
        the atmosphere transmissivity (-)

    Returns
    -------
    temperature : ndarray of size n_years + 1
    
    """
    
    # Standard Variables
    c = 4.0e+08
    dt = 60 * 60 * 24 * 365
    
    # Series computation
    temperature = np.zeros(n_years + 1)
    temperature[0] = t0
    for i in range(n_years):
        temperature[i + 1] = temperature[i] + dt / c * (asr(alpha=alpha_from_temperature(temperature[i])) - olr(temperature[i], tau=tau))
    return temperature

print("For t0 = 292K, does it converge to 288K? ", np.isclose(temperature_change_with_hysteresis(292, 20)[20], 288, rtol=0, atol=.2))
print("For t0 = 265K, does it converge to 233K? ", np.isclose(temperature_change_with_hysteresis(265, 60)[60], 233, rtol=0, atol=.2))

For t0 = 292K, does it converge to 288K?  True
For t0 = 265K, does it converge to 233K?  True

01-04: Realize a total of N simulations with starting temperatures regularly spaced between `t0`=206K, and `t0`=318K and plot them on a single plot for `n_years`=50. The plot should look somewhat similar to this example for `N=21`.¶

import pandas as pd


def temp_change_sim(t0=206, t1=318, n_sim=21, n_years=50, coloured=False):
    """
    Plots the temperature change scenarios for a given number of simumlations between a certain temperature range.
    Does not return anything!
    
    Parameters
    ----------
    t0 : float (optional)
        Start of the starting temperature range (K)
    t1 : float (optional)
        End of the starting temperature range in (K)
    n_sim : int (optional)
        the number of simulations to be run
    n_years : int (optional)
        number of years the simulation should run
    coloured : boolean (optional)
        Wheter to use a colormap
        
    """
    t_start = np.linspace(t0, t1 + 1, n_sim)
    years = np.arange(0, n_years + 1, 1)
    df = pd.DataFrame(index=years)
    for t in t_start:
        values = temperature_change_with_hysteresis(t, n_years)
        df[t] = values    
        
    # Plotting (with colour-map and coloured)
    if coloured == True:
        df.plot(figsize=(8, 5), cmap=plt.get_cmap("coolwarm"), grid=True, xlim=(0, n_years + 2), legend=None, xlabel="Years", ylabel="Temperature (K)", 
            title="Climate change scenario with hysteresis")
    else:
        df.plot(figsize=(8, 5), color="black", grid=True, xlim=(0, n_years + 2), legend=None, xlabel="Years", ylabel="Temperature (K)", 
            title="Climate change scenario with hysteresis")
    return None

1	temp_change_sim()

***Bonus**: only if you want (and if time permits), you can try to increase N and add colors to your plot to create a graph similar to this one.*

# Looks similar (this colourmap was just too daring for a temperature scenario...)
temp_change_sim(n_sim=100, coloured=True)

02 - Weather station data files¶

I downloaded 10 min data from the recently launched ZAMG data hub. The data file contains selected parameters from the "INNSBRUCK-FLUGPLATZ (ID: 11804)" weather station.

You can download the data from the following links (right-click + "Save as..."):

station data
parameter metadata
station list from the ZAMG (in a better format than last time)

Let me open the data for you and display its content:

df = pd.read_csv('INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True)
df = df.drop('station', axis=1)
df

	DD	FF	GSX	P	RF	RR	SO	TB1	TB2	TB3	TL	TP
time
2015-01-01 00:00:00	232.0	1.0	0.0	965.7	96.0	0.0	0.0	3.2	3.6	5.2	-2.7	-3.4
2015-01-01 00:10:00	248.0	1.6	0.0	965.6	97.0	0.0	0.0	3.2	3.6	5.2	-2.4	-2.9
2015-01-01 00:20:00	252.0	1.6	0.0	965.6	97.0	0.0	0.0	3.2	3.6	5.2	-2.4	-2.9
2015-01-01 00:30:00	267.0	1.3	0.0	965.7	97.0	0.0	0.0	3.2	3.6	5.2	-2.7	-3.2
2015-01-01 00:40:00	279.0	1.4	0.0	965.7	97.0	0.0	0.0	3.2	3.6	5.2	-2.5	-3.0
...	...	...	...	...	...	...	...	...	...	...	...	...
2021-12-31 23:10:00	257.0	1.1	0.0	959.8	98.0	0.0	0.0	3.1	3.4	3.4	-1.7	-1.7
2021-12-31 23:20:00	106.0	0.3	0.0	960.0	100.0	0.0	0.0	3.0	3.4	3.4	-2.0	-2.0
2021-12-31 23:30:00	141.0	0.5	0.0	960.2	100.0	0.0	0.0	3.0	3.4	3.4	-1.7	-1.7
2021-12-31 23:40:00	167.0	0.5	0.0	960.2	100.0	0.0	0.0	3.0	3.4	3.4	-1.6	-1.6
2021-12-31 23:50:00	275.0	0.3	0.0	960.1	100.0	0.0	0.0	3.0	3.4	3.4	-1.6	-1.7

368208 rows × 12 columns

02-01: after reading the documentation of the respective functions (and maybe try a few things yourself), explain in plain sentences:¶

what am I asking pandas to do with the index_col=1, parse_dates=True keyword arguments? Why am I doing this?
what am I asking pandas to do with .drop()? Why axis=1?

ndex_col=1 : Pandas uses the column on position 1 for indexing (here it is the time)
parse_dates=True : Allowes Pandas to analyze the index and transform them into datetime objects (use dayfirst=True for international time standard.) default: index_col=0

df.drop() : Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names

axis=0 : drop/delete one of the index (the specified),
axis=1 : drop/delete one of the specified columns

Now let me do something else from you:

dfmeta = pd.read_csv('ZEHNMIN Parameter-Metadaten.csv', index_col=0)
dfmeta.loc[df.columns]

	Kurzbeschreibung	Beschreibung	Einheit
DD	Windrichtung	Windrichtung, vektorielles Mittel über 10 Minuten	°
FF	vektorielle Windgeschwindigkeit	Windgeschwindigkeit, vektorielles Mittel über ...	m/s
GSX	Globalstrahlung	Globalstrahlung, arithmetisches Mittel über 10...	W/m²
P	Luftdruck	Luftdruck, Basiswert zur Minute10	hPa
RF	Relative Feuchte	Relative Luftfeuchte, Basiswert zur Minute10	%
RR	Niederschlag	10 Minuten Summe des Niederschlags, Summe der ...	mm
SO	Sonnenscheindauer	Sonnenscheindauer, Sekundensumme über 10 Minuten	s
TB1	Erdbodentemperatur in 10cm Tiefe	Erdbodentemperatur in 10cm Tiefe, Basiswert zu...	°C
TB2	Erdbodentemperatur in 20cm Tiefe	Erdbodentemperatur in 20cm Tiefe, Basiswert zu...	°C
TB3	Erdbodentemperatur in 50cm Tiefe	Erdbodentemperatur in 50cm Tiefe, Basiswert zu...	°C
TL	Lufttemperatur in 2m	Lufttemperatur in 2m Höhe, Basiswert zur Minute10	°C
TP	Taupunkt	Taupunktstemperatur, Basiswert zur Minute10	°C

02-02: again, explain in plain sentences what the `dfmeta.loc[df.columns]` is doing, and why it works that way.¶

dfmeta.loc : Access a group of rows and columns by label(s)
df.columns : Specifies to access all columns of the DataFrame df. So just the in df "used" columns are specified

Alltogether, this results in only the in df used columns will be copied and then printed from dfmeta

Finally, let me do a last step for you before you start coding:

dfh = df.resample('H').mean()
# dfh

02-03: explore the `dfh` dataframe. Explain, in plain words, what the purpose of `.resample('H')` followed by `mean()` is. Explain what `.resample('H').max()` and `.resample('H').sum()` would do.¶

.resample("H") : Groups the DataFrame by hourly data. In this case, df.resample("H").mean() is going to only take the mean values within every hour for all columns and will set that as one value per hour.
(in datatime format, "H" stands for hour. Usually for smaller units less than a day (hours, minutes, seconds, a large letter is used.))
It makes only sense to use this method in combination with .mean() , .sum() or something similar in order to state how the data should be merged into the hourly value. e. g. the sum of one hour or the average value in one hour, etc.
.resample('H').max() : This would take the maximum of the 10min data within every hour, while
.resample('H').sum() : would take the sum of the 10min data within one hour and take this as its hourly value.

02-04: Using `np.allclose`, make sure that the average of the first hour (that you'll compute yourself from `df`) is indeed equal to the first row of `dfh`.¶

if np.allclose(dfh.iloc[0].to_numpy(), df.iloc[:6].mean()) is True:
    print("The dataframe 'dfh' is indeed identical to the manual computed average.")
    
# USING df.iloc[3].to_numpy(), the first row is converted into an numpy array. TRANSFORM THE EX1 TO USE DATAFRAMES EARLIER!

The dataframe 'dfh' is indeed identical to the manual computed average.

Now, two variables in the dataframe have units that aren't suitable for averaging. Please convert the following variables to the correct units:

RR needs to be converted from the average of 10 min sums to mm/h
SO needs to be converted from the average of 10 min sums to s/h

# Split the DataFram into 3 DataFrames either using sum or average and merge them back together
# This is a design choice for the columns to stay in their current order.
dfh = pd.concat([df.resample('H').mean()[['DD', 'FF', 'GSX', 'P', 'RF']], df[['RR', 'SO']].resample('H').sum(), 
                 df.resample('H').mean()[['TB1', 'TB2', 'TB3', 'TL', 'TP']]], axis = 1)
dfh

	DD	FF	GSX	P	RF	RR	SO	TB1	TB2	TB3	TL	TP
time
2015-01-01 00:00:00	257.333333	1.383333	0.0	965.666667	97.000000	0.0	0.0	3.200000	3.600000	5.2	-2.533333	-3.066667
2015-01-01 01:00:00	160.666667	0.766667	0.0	965.966667	97.666667	0.0	0.0	3.150000	3.600000	5.2	-3.250000	-3.716667
2015-01-01 02:00:00	227.000000	0.816667	0.0	966.316667	98.500000	0.0	0.0	3.100000	3.600000	5.2	-3.583333	-3.933333
2015-01-01 03:00:00	258.833333	1.933333	0.0	966.466667	97.000000	0.0	0.0	3.100000	3.600000	5.2	-3.250000	-3.766667
2015-01-01 04:00:00	241.500000	1.266667	0.0	966.416667	95.166667	0.0	0.0	3.100000	3.600000	5.2	-3.116667	-3.883333
...	...	...	...	...	...	...	...	...	...	...	...	...
2021-12-31 19:00:00	283.500000	1.166667	0.0	957.766667	98.833333	0.0	0.0	3.316667	3.300000	3.4	0.900000	0.716667
2021-12-31 20:00:00	237.333333	0.716667	0.0	958.100000	99.500000	0.0	0.0	3.283333	3.300000	3.4	-0.200000	-0.350000
2021-12-31 21:00:00	149.833333	1.083333	0.0	958.783333	99.666667	0.0	0.0	3.200000	3.300000	3.4	-0.800000	-0.916667
2021-12-31 22:00:00	175.833333	0.516667	0.0	959.433333	99.166667	0.0	0.0	3.100000	3.383333	3.4	-1.083333	-1.200000
2021-12-31 23:00:00	198.833333	0.600000	0.0	960.016667	99.500000	0.0	0.0	3.033333	3.400000	3.4	-1.683333	-1.716667

61368 rows × 12 columns

***From now on, we will use the hourly data only** (and further aggregations when necessary). The 10 mins data are great but require a little bit more of pandas kung fu (the chinese term, not the sport) to be used efficiently.*

Spend some time exploring the dfh dataframe we just created. What time period does it cover? What variables does it contain?

print(f"The data stored ranges from {dfh.index[0]} UTC until {dfh.index[-1]} UTC which is a time span of {dfh.index[-1] - dfh.index[0]} hours.")

The data stored ranges from 2015-01-01 00:00:00 UTC until 2021-12-31 23:00:00 UTC which is a time span of 2556 days 23:00:00 hours.

Please note that the longer part of the description had to be cut out because it did not reflect the actual stored information anymore (hourly-data vs 10-min data)
The DataFrame contains the following variables:

dfmeta.loc[dfh.columns].iloc[:, :1]

	Kurzbeschreibung
DD	Windrichtung
FF	vektorielle Windgeschwindigkeit
GSX	Globalstrahlung
P	Luftdruck
RF	Relative Feuchte
RR	Niederschlag
SO	Sonnenscheindauer
TB1	Erdbodentemperatur in 10cm Tiefe
TB2	Erdbodentemperatur in 20cm Tiefe
TB3	Erdbodentemperatur in 50cm Tiefe
TL	Lufttemperatur in 2m
TP	Taupunkt

03 - Precipitation¶

In this section, we will focus on precipitation only.

03-01: Compute the average annual precipitation (m/year) over the 7-year period.¶

annual_precip = pd.DataFrame()
annual_precip["Average Precipitation"] = df["RR"].resample("7D").sum().resample("Y").mean()
annual_precip.set_index(annual_precip.index.year)
# As DataFrame for the looks!

	Average Precipitation
time
2015	16.237736
2016	17.407692
2017	19.890385
2018	14.926923
2019	19.898077
2020	17.381132
2021	17.532692

03-02: What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?¶

# Some general data sorting:
precip = pd.DataFrame()
precip["RR"] = dfh["RR"]
rr = dfh[precip > 0]["RR"]

# Minimum Precipitation:
min_rr = precip[precip == rr.min()]
print(f"The smallest non-zero precipitation measured was: {rr.min()} mm/h, which happenend {min_rr.size} times, "
      f"first on {min_rr.index[0]} UTC and last on {min_rr.index[-1]} UTC")

# Maximum Precipitation:
max_rr = precip[precip == rr.max()]
print(f"The greatest precipitation measured was: {rr.max()} mm/h, which happenend on {max_rr.index[0]}.")

The smallest non-zero precipitation measured was: 0.1 mm/h, which happenend 61368 times, first on 2015-01-01 00:00:00 UTC and last on 2021-12-31 23:00:00 UTC
The greatest precipitation measured was: 22.2 mm/h, which happenend on 2015-01-01 00:00:00.

03-03: Plot a histogram of hourly precipitation, with bins of size 0.2 mm/h, starting at 0.1 mm/h and ending at 25 mm/h.¶

ax = rr.plot.hist(bins=125, range=(0,25), grid=True, xlim=(0, 25), xlabel="Precipitation (mm/h)", ylabel="Number of Occurence", 
                  title="Hourly precipitation in Innsbruck from 2015 to 2021");
# To control wheter the 'bins' value is set correctly: (the vertical line and the first bin size should match each other) 
# ax.vlines(0.2, 0, 1600, color="black");

Plot the same data, but this time with a logarithmic y-axis.

ax = rr.plot.hist(bins=125, range=(0,25), grid=True, xlim=(0, 25), log=True, xlabel="Precipitation (mm/h)", ylabel="Number of Occurence", 
                  title="Hourly precipitation in Innsbruck from 2015 to 2021")

Compute the 99th percentile (or quantile) of hourly precipitation.

quantile = precip["RR"].quantile(q=.99)  # stored for use in exercise below
print(f"99% of the precipitation was less than {quantile:.2f} mm/h")

99% of the precipitation was less than 2.33 mm/h

03-04: Compute daily sums (mm/d) of precipitation (tip: use `.resample` again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).¶

# Daily sums (mm/day)
rday = precip["RR"].resample("d").sum()

# Computing the number of rain days:
print(f"From 2015 until the end of 2021, Innsbruck had an average of {np.ceil(rday[rday > 0].resample('y').size().mean()):.0f} rain days per year.")

From 2015 until the end of 2021, Innsbruck had an average of 171 rain days per year.

03-05: Now select (subset) the daily* dataframe to keep only daily data in the months of December, January, February (DJF). To do this, note that `dfh.index.month` exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).*¶

# The two data subsets to analyse:
djf, jja = (12, 1, 2), (6, 7, 8)

# A translation-dictionary in order to convert the column names properly
translate = {
        12 : "December",
        1 : "January",
        2 : "February",
        6 : "June",
        7 : "July",
        8 : "August"
    }


def season_subset(subset, hourly=False, adv=False):
    """
    Subsets the precipitation data into the given month's (index).
    
    Parameters
    ----------
    subset : array-like
        xy
    hourly : bool (optional)
        Wheter to subset the data with hourly values
    adv : bool (optional)
        Wheter to return the data into a single column
    
    Returns
    -------
    dDJFf : pd.DataFrame
        The sliced data in Pandas DataFrame
    
    """
    dfDJF = pd.DataFrame()
    if hourly is False:
        dfr = rday
    else:
        dfr = precip["RR"].resample("H").sum()
    if adv is False:
        for m in subset:
            # This ensures that columns of all length can be added to the DataFrame
            dfDJF = pd.concat([dfDJF, pd.DataFrame({translate[m] : dfr[dfr.index.month == m].values})], axis=1)
        return dfDJF
    else:  # for a merged DataFrame (1 column)
        for m in subset:
            dfDJF = pd.concat([dfDJF, pd.DataFrame(dfr[dfr.index.month == m])], axis=0)
        return dfDJF
    
    

def subset_stats(subset):
    """
    Prints the mean daily precipitation and rain days in a given subset.
    Does not return anything.
    
    Paramters
    ---------
    subset : array-like
        The month indices for the data slicing
    
    """
    df = season_subset(subset)   
    print(f"The average daily precipitation was {df.mean().mean():.2f} mm/h and there were an average of {np.ceil((df[df > 0].count() / 7).mean()):.0f} rain days.")
    return None

Now, the subset statistics of daily data for the months of December, January and February (DJF):

subset_stats(djf)

The average daily precipitation was 1.75 mm/h and there were an average of 13 rain days.

And - seperately, the subset statistics of daily data for the months of June, July and August (JJA):

subset_stats(jja)

The average daily precipitation was 4.02 mm/h and there were an average of 18 rain days.

03-06: Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.¶

# The two seasons to analyse:
season = (djf, jja)

# Another translation-dictionary, again, for proper column name conversion
tlsea = {
    (12, 1, 2) : "winter",
    (6, 7, 8) : "summer"
}

for s in season:
    subset_h = season_subset(s, hourly=True)
    print(f"In the {tlsea[s]} season, the 99% quantile was hit {subset_h[subset_h > quantile].count().sum()} times in the 7 years range.")

In the winter season, the 99% quantile was hit 68 times in the 7 years range.
In the summer season, the 99% quantile was hit 308 times in the 7 years range.

03-07: Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA.¶

I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour exists and can be used to categorize data. Then, note that df.groupby exists and can be used exactly for that (documentation).

# Seasonal data:
daily_avg = pd.DataFrame()
for s in season:
    djf_hourly = season_subset(s, hourly=True, adv=True)
    daily_avg[tlsea[s]] = djf_hourly.groupby(djf_hourly.index.hour).mean()

# Plotting:
daily_avg.plot(grid=True, title="Precipitation daily cycle Innsbruck 2015-2021", xlabel="Hour of day (UTC)", ylabel="Hourly Precipitation (mm/h)");

04 - A few other variables¶

In this section, we will continue to analyze the weather station data.

04-01: Verify that the three soil temperatures have approximately the same average value over the entire period.¶

# Get all soil temperatures for easier use later on
soil = dfh[["TB1", "TB2", "TB3"]]

# prints if these 3 are nearly the same
if np.allclose(soil.mean(), 11.4, rtol=0, atol=.12) is True:
    print("The average of the three soil temperatures in the whole time range is approxiamtely the same with 11.4°C ±0.12°C")

The average of the three soil temperatures in the whole time range is approxiamtely the same with 11.4°C ±0.12°C

Now plot the three soil temperature timeseries in hourly resolution over the course of the year of 2020 (example). Repeat the plot with the month of may 2020.

soil_hourly = soil.resample("h").mean()
soil_hourly.loc["2020"].plot(grid=True, xlabel="", ylabel="Temperature (°C)", title="Soil temperature in 2020");

Now, only for May 2020:

soil_hourly.loc["2020-05"].plot(grid=True, xlabel="", ylabel="Temperature (°C)", title="Soil temperature in May 2020");

04-02: Plot the average daily cycle of all three soil temperatures.¶

soil_hourly.groupby(soil_hourly.index.hour).mean().plot(grid=True, xlabel="Hour of day (UTC)", ylabel="Temperature (°C)",
                                                        title="Average daily soil temperature in 2015-2021");

04-03: Compute the difference (in °C) between the air temperature and the dewpoint temperature. Now plot this difference on a scatter plot (x-axis: relative humidity, y-axis: temperature difference).¶

# The temperature difference
diff = dfh["TL"] - dfh["TP"]

# The plot:
fig, ax = plt.subplots(figsize=(9, 5))
ax.scatter(dfh["RF"], diff, s=2, alpha=.2, color="darkcyan")
ax.set_xlabel("Relative humidity (%)")
ax.set_ylabel("Temperature difference (K)")
ax.set_title("Temperature difference between air temperature and dew point temperature");

05 - Free coding project¶

05-00: Introduction¶

For the coding project I choose the three stations of Augsburg, Hohenpeißenberg and Seefeld in Tyrol. They lay in Bavaria and Tyrol and are 125km apart from each other (Augsburg-Seefeld).

The three stations form an imaginary line from the alpine foreland in Augsburg towards the mountainious Seefeld in Tyrol. My goal with this data analysis is to find possible correlations among these three. Because they are relatively close to each other, they are often affected by the same general weather situation which will be shown later on.

All map images for this project are from Openstreetmap. All Files and images can be found under their respective folders!

Station Overview

Since the three station share roughly the same longitude, (most) major synoptic events, are influenced by the general weather situation. This accounts for the time at when they occur as well - at least for the east-west direction as these station share about the same longitude.

	Augsburg	Hohenpeißenberg	Seefeld
Station location (marked orange)
Scale
Latitude (N)	48° 25'	47° 48'	47.324722
Longitude (E)	10° 56'	11° 00'	11.175555
Height above sea level	462 m	977 m	1182 m
In operation since	1947	1781	1997
Operator	DWD	DWD	ZAMG - Geosphere Austria

data from the DWD weather stations overview and the ZAMG station data.

05-01: Reading and preparing the data¶

# Delete later on
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Reading the data files which were split up by temperature, precipitation, wind and sun hours. The files contain data from the last 500 days from June 8th, 2023 backwards.
Since the station datas came from different service providers, they have small differences in their data format which was corrected before the implementation.

The data for the weather stations come from Geosphere Austria Data Hub for Seefeld in Tyrol and for Augsburg and Hohenpeißenberg from the DWD's Climate Data Center.

# Preparing to read each station's data
data = {
    "tu" : ["MESS_DATUM", "PP_10", "TT_10", "RF_10", "TD_10"],
    "ff" : ["MESS_DATUM", "FF_10", "DD_10"],
    "rr" : ["MESS_DATUM", "RWS_DAU_10", "RWS_10"],
    "sd" : ["MESS_DATUM", "GS_10", "SD_10"]
}
a, hpb, sf = pd.DataFrame(), pd.DataFrame(), pd.DataFrame()

# Reading Augsburg's data (DWD)
for d in data:
    values = pd.read_csv(f"station_data/produkt_zehn_min_{d}_20211206_20230608_00232.txt", sep=";", usecols=data[d],
            index_col="MESS_DATUM", parse_dates=True, na_values=-999)
    a = pd.concat([a, values], axis=1)

# Reading Hohenpeißenberg's data (DWD)
for d in data:
    values = pd.read_csv(f"station_data/produkt_zehn_min_{d}_20211206_20230608_02290.txt", sep=";", usecols=data[d],
            index_col="MESS_DATUM", parse_dates=True, na_values=-999)
    hpb = pd.concat([hpb, values], axis=1)
del(values, data)  # give free some no longer used storage
   
# Transform J/cm² in W/m² for DWD stations
for s in (a, hpb):
    s["GS_10"] = s["GS_10"] * 50 / 3 

# Reading Seefeld's data (ZAMG)
sf = pd.read_csv("station_data/ZEHNMIN Datensatz_20211206T0000_20230608T2350.csv", index_col=0, parse_dates=True).drop("station", axis=1)
sf.index = pd.to_datetime(sf.index.strftime("%Y-%m-%d %H:%M:%S"), format="%Y-%m-%d %H:%M:%S")  # For matching time format

Because the station data came from different service providers, the data for incoming short-wave radiation and sunshine duration differs in format.

DWD station GS_10 10min sum in $\frac{J}{cm^2}$ while ZAMG station GSX 10min average in $\frac{W}{m^2}$ and DWD station SD_10 hourly sunshine per 10 min while the ZAMG station displays this data as sunshine seconds per 10 min

Quick Overview over the read data

	Wind speed	Wind direction	Pressure	Relative Humidity	Precipitation	Min. of Precip. per 10min	Sunshine duration per 10min	Incoming short-wave radiation	Air temperature (2m)	Dew point temperature
DWD Stations	FF_10	DD_10	PP_10 in (0, 360]	RF_10	RWS_10	RWS_DAU_10	SD_10 in hours/10min	s.a.	TT_10	TD_10
ZAMG Station	FF	DD	P in [0, 360]	RF	RR	RRM	SO in sec/10min	s.a.	TL	TP

**s.a. : see above*

05-02: Temperature comparision of the last 500 days¶

# Preparing the plotting - Temperature 7 days mean
a_mon = a["TT_10"].resample("7d").mean()
hpb_mon = hpb["TT_10"].resample("7d").mean()
sf_mon = sf["TL"].resample("7d").mean()

# Plotting
fig, ax = plt.subplots(figsize=(10, 5))
ax = a_mon.plot(color="green", ylabel="Temperature (°C)", title="7 days average temperature over last 500 days", label="Augsburg (462m)")
hpb_mon.plot(ax=ax, color="blue", label="Hohenpeißenberg (977m)")
sf_mon.plot(ax=ax, color="red", label="Seefeld (1182m)")

# Gridding
ax.grid(which="major", color="gray")
ax.grid(which="minor", axis="both")
ax.legend();

The weekly average among all three station followed nearly the same course over the last year, with Seefeld staying a few degrees below the other two. Hohenpeißenberg however, stays about a degree below Augsburg's curve. In order to confirm this, the mean temperature over the year 2022 is calculated. The edges (so December 2021 and 2023) are excluded in order not to mix the calculated average over 500 days span with the year average temperature.

print(f"In 2022, the measured mean temerature in Augsburg was {a['TT_10'].loc['2022'].mean():.2f}°C, in Hohenpeißenberg it was "
      f"{hpb['TT_10'].loc['2022'].mean():.2f}°C while in Seefeld it was {sf['TL'].loc['2022'].mean():.2f}°C")
print(f"For comparison, the mean temperature over the 500 days span was {a['TT_10'].mean():.2f}°C in Augsburg, "
      f"{hpb['TT_10'].mean():.2f}°C in Hohenpeißenberg and {sf['TL'].mean():.2f}°C in Seefeld.")

In 2022, the measured mean temerature in Augsburg was 10.32°C, in Hohenpeißenberg it was 9.32°C while in Seefeld it was 6.53°C
For comparison, the mean temperature over the 500 days span was 8.94°C in Augsburg, 7.69°C in Hohenpeißenberg and 4.97°C in Seefeld.

Besides, the graph suggests a minor weekly difference in the warmer months while in the winter, the temperature tends to drift further apart. For a preciser analysis however, a climate analysis over a few decades would be favourable. This exercise limits itself to the 500 days span though.

Alltogether, since all graphs round-about share the same curve, the local properties around the measurement sites only affect the weekly temperature on a minor scale, e.g. on a daily temperature scale and less. The weekly mean temperature is therefore more affected by the general weather situation which due to the sites close proximity, they occur around the same time.

Next up, a more detailed analysis over the temperature follows.

05-03: 10 min temperature data and effect of local differences on it¶

As announced, a full plot over the whole date range among the 10 min data (coloured) with its weekly trend line. This trend line is the same as in the plot above.

fig, (t1, t2, t3) = plt.subplots(1, 3, figsize=(16, 3))
a["TT_10"].plot(ax=t1, color="green", linewidth=1, ylabel="Temperature (°C)", title="Temperature over last 500 days in Augsburg")
hpb["TT_10"].plot(ax=t2, color="blue", linewidth=1, ylabel="Temperature (°C)", title="Temperature over last 500 days in Hohenpeißenberg")
sf["TL"].plot(ax=t3, color="red", linewidth=1, ylabel="Temperature (°C)", title="Temperature over last 500 days in Seefeld")

# Temperature trend line
a_mon.plot(ax=t1, color="black")
hpb_mon.plot(ax=t2, color="black")
sf_mon.plot(ax=t3, color="black");
  
# Grid:
for axis in (t1, t2, t3):
    axis.set_ylim(-18, 37)
    axis.grid(which="major", color="gray")
    axis.grid(which="minor", axis="both")

$No description has been provided for this image$

When following the curves a clear minimum temperature value around mid December falls out with a relatively strong positive temperature gradient. In order to investigate it further, the date range is sliced around December 2022.

# Unfortunately it is quite hard to ease this up with for-loops - Issues with specifying the axis.
fig, ((t1, p1), (t2, p2), (t3, p3)) = plt.subplots(3, 2, figsize=(16, 9), width_ratios=[4, 3])

# Right side - whole Dec 2022
a["TT_10"].loc["2022-12"].plot(ax=t1, color="green", linewidth=1, ylabel="Temperature (°C)", \
                               title="Temperature & dewpoint in Dec 2022 in Augsburg (462m)")
hpb["TT_10"].loc["2022-12"].plot(ax=t2, color="blue", linewidth=1, ylabel="Temperature (°C)", \
                                 title="Temperature & dewpoint in Dec 2022 in Hohenpeißenberg (977m)")
sf["TL"].loc["2022-12"].plot(ax=t3, color="red", linewidth=1, ylabel="Temperature (°C)", \
                             title="Temperature & dewpoint in Dec 2022 in Seefeld (1182m)")

a["TD_10"].loc["2022-12"].plot(ax=t1, color="green", linewidth=1, style=":", alpha=.6)
hpb["TD_10"].loc["2022-12"].plot(ax=t2, color="blue", linewidth=1, style=":", alpha=.6)
sf["TP"].loc["2022-12"].plot(ax=t3, color="red", linewidth=1, style=":", alpha=.6)

# Left side - 17th Dec until 19th Dec
a["TT_10"].loc["2022-12-17":"2022-12-19"].plot(ax=p1, color="green", linewidth=1, ylabel="Temperature (°C)", \
                                               title="Temperature & dewpoint around 18th Dec in Augsburg (462m)")
hpb["TT_10"].loc["2022-12-17":"2022-12-19"].plot(ax=p2, color="blue", linewidth=1, ylabel="Temperature (°C)", \
                                                 title="Temperature & dewpoint around 18th Dec in Hohenpeißenberg (977m)")
sf["TL"].loc["2022-12-17":"2022-12-19"].plot(ax=p3, color="red", linewidth=1, ylabel="Temperature (°C)", \
                                             title="Temperature & dewpoint around 18th Dec in Seefeld (1182m)")

a["TD_10"].loc["2022-12-17":"2022-12-19"].plot(ax=p1, color="green", linewidth=1, style=":", alpha=.6)
hpb["TD_10"].loc["2022-12-17":"2022-12-19"].plot(ax=p2, color="blue", linewidth=1, style=":", alpha=.6)
sf["TP"].loc["2022-12-17":"2022-12-19"].plot(ax=p3, color="red", linewidth=1, style=":", alpha=.6)

# Make the plots share the same y-range
for axis in (t1, t2, t3):
    axis.set_ylim(-20, 20)
for axis in (p1, p2, p3):
    axis.set_ylim(-17, 12)

# Griding among all plots
for axis in (t1, t2, t3, p1, p2, p3):
    axis.grid(which="major")
    axis.grid(which="minor", axis="both", color="lightgray")

The data shows some discrepancies on Dec 18th when compared to each other. In Augsburg, the temperature doesn't change much during the time period, staying around -7°C. This suggests a continous cloud cover over the city - inhibiting a significant warming of the ground.

For Hohenpeißenberg, the temperature is increasing during the whole day. When looking at webcam images nearby the weather station in Hohenpeißenberg, a retracting inversion can be observed, with the station after the early morning being above the cloud layer. The cloud layer continous to cease during the night towards the 19th.

In Seefeld however, there is a large temperature difference over the course of the day, suggesting a rather sunny day where the temperature peaks at midday due to solar radiation and drops again due to cloud-free conditions allowing long-wave radiation to emitt into the atmosphere.

To further examine the local weather situation, the incoming short-wave radiation as well as the sunshine duration is consulted below.

# PLotting SW data
fig, ((ax1, ax2, ax3), (s1, s2, s3)) = plt.subplots(2, 3, figsize=(16, 6), height_ratios=[2, 1])
a["GS_10"].loc["2022-12-18"].plot(ax=ax1, color="gold", title="Augsburg on Dec 18th", ylabel="short-wave radation (W/m²)")
hpb["GS_10"].loc["2022-12-18"].plot(ax=ax2, color="gold", title="Hohenpeißenberg on Dec 18th")
sf["GSX"].loc["2022-12-18"].plot(ax=ax3, color="gold", title="Seefeld on Dec 18th")

# PLotting sunshine data
(a["SD_10"].loc["2022-12-18"] * 60).plot(ax=s1, ylabel="Sunshine duration (min/10min)")
(hpb["SD_10"].loc["2022-12-18"] * 60).plot(ax=s2)
(sf["SO"].loc["2022-12-18"] / 60).plot(ax=s3)
s1.set_ylim(s2.get_ylim());

# Deco
for axis in (ax1, ax2, ax3):
    axis.set_ylim(ax2.get_ylim())
    axis.set_xlabel("")
    axis.grid(which="both")

And as seen in the plots, in Augsburg there is little incoming short-wave radiation which could warm the air at ground level. The data further suggests that this is due to missing sunshine (sunshine duration is zero during the whole day) what would be caused by extensive cloud coverage. A short view on available webcam images from the 9 km distant "Rathausplatz" supports this.

For both Hohenpeißenberg and Seefeld, the SW data and the sunshine duration are incompatible with a covered sky, they rather indicate a day with minimal cloud coverage over the station. Interestingly, the sunshine in Seefeld start later than in Hohenpeißenberg what would be caused by a mountain obstructing the sunlight. The relatively sharp incline at sunrise compared to Hohenpeißenberg supports this.

Temperature increase in Seefeld was due to direct sun shine (from 9-14h), the sunshine duration data suggests that there were clear sky conditions in Seefeld which is an explanation for the following drop in temperature during nightly hours.

And because there are webcam images available for the sites, it is possible to visualize the sky and cloud conditons again. They support said weather conditions as well.

Webcam images:

	Augsburg-Rathausplatz	Hohenpeißenberg-Station	Seefeld Sports Arena
Webcam image
Source	Bergfex.at	foto-webcam.eu	seefeld.com
Additional Information	Webcam is located 9 km away from station. The webcam at the airport does only show up-to-date images	Weather station and Webcam are on a hill	Weather station and webcam are near the sports arena

05-04: Solar cycle and pressure¶

# Sunshine hours per day
a_sol = a["SD_10"].resample("d").sum().resample("7d")  # initial data: sunshine hours per 10min
hpb_sol = hpb["SD_10"].resample("d").sum().resample("7d")  # initial data: sunshine hours per 10min
sf_sol = (sf["SO"].resample("d").sum() / 3600).resample("7d")  # initial data: sunshine seconds per 10min

translate = {
    "Augsburg" : a_sol,
    "Hohenpeißenberg" : hpb_sol,
    "Seefeld" : sf_sol
}

# daily sunshine hours - weekly min/max
sol = pd.DataFrame()
for s in ("Augsburg", "Hohenpeißenberg", "Seefeld"):
    sol[s + " max"] = translate[s].max()
    sol[s + " min"] = translate[s].min()

# The plot with min/max sunshine hours
sol.plot(figsize=(16, 5), color=["green", "green", "blue", "blue", "red", "red"], style=["-", ":", "-", ":", "-", ":"], \
        grid=True, ylim=(0, 16.1), alpha=.7, ylabel="Sunshine duration (hours)", \
        title="Average daily sunshine duration in hours per week in the last 500 days") \
        .legend(loc="lower center", bbox_to_anchor=(0.5, -0.3), ncols=6);

Each station's maximum sunshine hours graph follows the length of a day. For Seefeld the day length is about two hours shorter than the other stations'. Again, this is due to mountains blocking the direct sunlight at the location of the station. Therefore - the day length differs quite a lot depending on the exact location whe in a valley.

Pressure

# As function call for later use
def plotpressure500(aug, hpberg, sfeld, info="Pressure"):
    """
    Plots the Pressure for the stations 'Augsburg', 'Hohenpeißenberg' and 'Seefeld'.
    Does not return anything
    
    Paramters
    ---------
    aug : array-like
        Pressure data in Augsburg
    hpberg : array-like
        Pressure data in Hohenpeißenberg
    sfeld : array-like
        Pressure data in Seefeld
    info : str (optional)
        Information about what is plotted
    
    """
    # Actual plot, data axis
    fig, pp = plt.subplots(figsize=(16, 4))
    aug.plot(ax=pp, color="green", linewidth=1, ylabel="Pressure (hPa)", label="Augsburg (462m)", title=f"{info} in Dec 2022")
    hpberg.plot(ax=pp, color="blue", linewidth=1, ylabel="Pressure (hPa)", label="Hohenpeißenberg (977m)")
    sfeld.plot(ax=pp, color="red", linewidth=1, ylabel="Pressure (hPa)", label="Seefeld (1182m)");

    # Gridding
    pp.grid(which="major")
    pp.grid(which="minor", axis="both", color="lightgray")
    pp.legend(loc="lower center", bbox_to_anchor=(0.5, -0.3), ncols=3)
    return None

plotpressure500(a["PP_10"], hpb["PP_10"], sf["P"])

Looking at the pressure among each stations it is very clear that they don't differ much from each other - they only differ in mean pressure. The mean pressure for each station is:

p = {
    "Augsburg" : [a['PP_10'].mean(), a['PP_10']],
    "Hohenpeißenberg" : [hpb['PP_10'].mean(), hpb['PP_10']],
    "Seefeld" : [sf['P'].mean(), sf['P']]
}

print(f"Augsburg: {p['Augsburg'][0]:.2f} hPa\nHohenpeißenberg: {p['Hohenpeißenberg'][0]:.2f} hPa\nSeefeld: {p['Seefeld'][0]:.2f} hPa")

Augsburg: 962.64 hPa
Hohenpeißenberg: 904.91 hPa
Seefeld: 883.90 hPa

When looking at the whole data range, it is valid to assume hydrostatic balance of the atmosphere with $H \approx 8.65 km$. Based on this assumption, one can calculate atmospheric pressure at each height using

$p = p_0 \cdot e^{- \frac{z}{H}}$ with $p_0 = 1013.25 hPa$ - the average pressure at sea level - and $p$ being the pressure at the station height $z$,.

height = {
    "Augsburg" : [.462, 9],
    "Hohenpeißenberg" : [.977, 8.65],
    "Seefeld" : [1.182, 8.65]
}

for z in height:
    print(f"Expected Pressure in {z}: {1013.25 * np.exp(- height[z][0] / 8.65):.2f} hPa")

Expected Pressure in Augsburg: 960.55 hPa
Expected Pressure in Hohenpeißenberg: 905.03 hPa
Expected Pressure in Seefeld: 883.84 hPa

However, for Augsburg $H \approx 9 km$ is a bit more fitting. When reducing the pressure measured on every station to its sea-level niveau using $H = 8.65km$ and $H_{Augsburg} = 9km$ and making the plot again, they conclude to the same pressure niveau.

z = "Augsburg"
p[z][0] * np.exp(height[z][0] / height[z][1])

1013.3497568495574

# Reduce pressure to sea-level niveau
pressure = pd.DataFrame()
for z in ("Augsburg", "Hohenpeißenberg", "Seefeld"):
    pressure[z] = p[z][1] * np.exp(height[z][0] / height[z][1])
    
# Plotting
plotpressure500(pressure["Augsburg"], pressure["Hohenpeißenberg"], pressure["Seefeld"], info="Sea-level-pressure")

From the graph above it is very clear that there is little difference between the pressure among all three stations. However, a more detailed regression analysis will highlight local differences stronger.

pd.plotting.scatter_matrix(pressure, figsize=(7, 7), color="orange", alpha=.2, s=2);

Now, the largest discrepancies are between Ausgsburg and Seefeld while the differences to Hohenpeißenerg are far smaller ($\pm 5 hPa$). This emphasises the local "uniformity" of pressure - the "nearby stations" measure around the same pressure while the difference gets larger for the greater distance (Augsburg-Seefeld) ($\pm 10 hPa$).

05-05: Precipitation and wind¶

# Precipitation sum over 72 hours
a_rr = a["RWS_10"].resample("3d").sum()
hpb_rr = hpb["RWS_10"].resample("3d").sum()
sf_rr = sf["RR"].resample("3d").sum()

from matplotlib.dates import DateFormatter, DayLocator


# Precipitation sum plot
ax = hpb_rr.plot.bar(figsize=(16, 4), color="blue", alpha=.6, ylabel="Precipitation sum (mm)", label="Hohenpeißenberg", \
                     title="Preicipitation sum over 72 hours (mm)")
a_rr.plot.bar(ax=ax, color="Green", alpha=.6, label="Augsburg")
sf_rr.plot.bar(ax=ax, color="purple", alpha=.6, label="Seefeld")
ax.grid()
ax.legend()

# Precipitation locator
ax.xaxis.set_major_locator(DayLocator(interval=18))

For Precipitation there is no longer a clear connectivity - outside the general weather situation determining the time and quantity of precipitation. However, due to the nearby alps the precipitation sum for Seefeld and Hohenpeißenberg are relatively high compared to Augsburg. The sites sums in 2022 were:

print(f'Augsburg: {a["RWS_10"].loc["2022"].sum()}mm\nHohenpeißenberg: {hpb["RWS_10"].loc["2022"].sum()}mm\nSeefeld: {sf["RR"].loc["2022"].sum():.2f}mm')

Augsburg: 572.07mm
Hohenpeißenberg: 1099.02mm
Seefeld: 1067.60mm

Wind

Beware, that the DWD wind direction data does only contain numbers between (0,360] while the ZAMG data ranges from [0,360]

# Reformat direction into (0, 360]:
sf["DD"] = np.where(sf["DD"]==0, 360, sf["DD"])

# from A08: def sd_to_uv(speed, direction):
def sd_to_uv(speed, direction):
    """Converts wind speed and direction to (u, v) vector components.

    Parameters
    ----------
    speed : ndarray-like
        wind speed in m/s
    direction : ndarray-like
        wind direction in degrees, meteorological convention (0° = North)

    Returns
    -------
    (u, v) : wind u and v vector components (unit: m/s)

    """
    math_dir = np.deg2rad(270 - np.fmod(np.asarray(direction) + 360, 360))
    u = speed * np.cos(math_dir)
    v = speed * np.sin(math_dir)
    return u, v


def uv_to_sd(u, v):
    """Converts (u, v) wind vector components to wind speed and direction.

    Parameters
    ----------
    u : ndarray-like
        u-component of the wind speed in m/s
    v : ndarray-like
        v-component of the wind speed in m/s

    Returns
    -------
    (speed, direction) : wind speed (unit: m/s) and direction (° in the meterological convention)

    """
    u = np.asarray(u)
    v = np.asarray(v)
    speed = np.sqrt(u**2 + v**2)
    direction = 270 - np.rad2deg(np.arctan2(v, u))
    return speed, np.fmod(direction + 360, 360)

# Calculate u_wind & v_wind
a["u_wind"], a["v_wind"] = sd_to_uv(a["FF_10"], a["DD_10"])
hpb["u_wind"], hpb["v_wind"] = sd_to_uv(hpb["FF_10"], hpb["DD_10"])
sf["u_wind"], sf["v_wind"] = sd_to_uv(sf["FF"], sf["DD"])

# Print the average wind direction
print(f'In Augsburg, the wind came from {uv_to_sd(a["u_wind"].loc["2022"].mean(), a["v_wind"].loc["2022"].mean())[1]:.0f}°\n'
      f'In Hohenpeißenberg, the wind came from {uv_to_sd(hpb["u_wind"].loc["2022"].mean(), hpb["v_wind"].loc["2022"].mean())[1]:.0f}°\n'
      f'In Seefeld, the wind came from {uv_to_sd(sf["u_wind"].loc["2022"].mean(), sf["v_wind"].loc["2022"].mean())[1]:.0f}°')

In Augsburg, the wind came from 246°
In Hohenpeißenberg, the wind came from 243°
In Seefeld, the wind came from 89°

The below plotted windroses show the wind characteristics in 2022. The mountainous terrain in Seefeld forces the wind to flow in SE-NW direction at the station level. In Hohenpeißenberg this effect is less pronounced due to the measurement site being on a flatter hillside. However, in Augsburg, this effect is not visible due to the almost flat terrain at the location.

from windrose import WindroseAxes


wind = {
    "Augsburg" : [a["DD_10"].loc["2022"], a["FF_10"].loc["2022"]],
    "Hohenpeißenberg" : [hpb["DD_10"].loc["2022"], hpb["FF_10"].loc["2022"]],
    "Seefeld" : [sf["DD"].loc["2022"], sf["FF"].loc["2022"]]
}

for s in wind:
    ax = WindroseAxes.from_ax()
    ax.contourf(wind[s][0], wind[s][1], bins=np.arange(0, 8, 1), normed=True)
    ax.set_title(f"Windrose for {s} in 2022")
    ax.legend(title="Wind Speed (m/s)", loc="upper left")

Programming project¶

Preamble¶

01 - An energy balance model with hysteresis¶

***01-01**: Compute the parameters $a$ and $b$ so that the equation is continuous at T=250K and T=280K.*¶

***01-02**: Now write a function called alpha_from_temperature which accepts a single positional parameter T as input (a scalar) and returns the corresponding albedo. Test your function using doctests to make sure that it complies to the instructions above.*¶

***01-04**: Realize a total of N simulations with starting temperatures regularly spaced between t0=206K, and t0=318K and plot them on a single plot for n_years=50. The plot should look somewhat similar to this example for N=21.*¶

02 - Weather station data files¶

***02-01**: after reading the documentation of the respective functions (and maybe try a few things yourself), explain in plain sentences:*¶

***02-02**: again, explain in plain sentences what the dfmeta.loc[df.columns] is doing, and why it works that way.*¶

***02-03**: explore the dfh dataframe. Explain, in plain words, what the purpose of .resample('H') followed by mean() is. Explain what .resample('H').max() and .resample('H').sum() would do.*¶

***02-04**: Using np.allclose, make sure that the average of the first hour (that you'll compute yourself from df) is indeed equal to the first row of dfh.*¶