Programming project¶

Preamble¶

Names: Jakob Mazzurana, Michael Steger
Matrikelnummer: 12023693, 11933323

01 - An energy balance model with hysteresis¶

Based on the code we wrote in week 07, code the following extension. For this exercise it's OK to copy-paste my solutions and start from there. Only copy the necessary (no useless code)!

The planetary albedo $\alpha$ is in fact changing with climate change. As the temperature drops, sea-ice and ice sheets are extending (increasing the albedo). Inversely, the albedo decreases as temperature rises. The planetary albedo of our simple energy balance model follows the following equation:

$$ \alpha = \begin{cases} 0.3,& \text{if } T \gt 280\\ 0.7,& \text{if } T \lt 250\\ a T + b, & \text{otherwise} \end{cases} $$

01-01: Compute the parameters $a$ and $b$ so that the equation is continuous at T=250K and T=280K.

01-02: Now write a function called alpha_from_temperature which accepts a single positional parameter T as input (a scalar) and returns the corresponding albedo. Test your function using doctests to make sure that it complies to the instructions above.

01-03: Adapt the existing code from week 07 to write a function called temperature_change_with_hysteresis which accepts t0 (the starting temperature in K), n_years (the number of simulation years) as positional arguments and tau (the atmosphere transmissivity) as keyword argument (default value 0.611). Verify that:

the stabilization temperature with t0 = 292 and default tau is approximately 288K
the stabilization temperature with t0 = 265 and default tau is approximately 233K

01-04: Realize a total of N simulations with starting temperatures regularly spaced between t0=206K, and t0=318K and plot them on a single plot for n_years=50. The plot should look somewhat similar to this example for N=21.

Bonus: only if you want (and if time permits), you can try to increase N and add colors to your plot to create a graph similar to this one.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from astral.sun import sun
from astral import LocationInfo
from datetime import datetime
m = np.array([[280, 1], [250, 1]])
l = np.array([0.3, 0.7])
s = np.linalg.solve(m,l)
a, b = s
print(f'a = {a}, b = {b}')

a = -0.01333333333333334, b = 4.033333333333335

def alpha_from_temperature(T):
    """Absorbed Solar Radiation (W m-2).

    Parameters
    ----------
    T : float

    Returns
    -------
    alpha (float)

    Examples
    --------
    >>> print(f'{alpha_from_temperature(250):.2f}')
    0.70
    >>> print(f'{alpha_from_temperature(280):.2f}')
    0.30
    >>> print(f'{alpha_from_temperature(265):.2f}')
    0.50
    """
    #compute parameters 
    m = np.array([[280, 1], [250, 1]])
    l = np.array([0.3, 0.7])
    s = np.linalg.solve(m,l)
    a, b = s
    #compute alpha
    if T < 250:
        alpha = 0.7
    elif T > 280:
        alpha = 0.3
    else:
        alpha = T*a + b
    return alpha
# Testing
import doctest
doctest.testmod()

TestResults(failed=0, attempted=3)

def asr(alpha):
    s0 = 1362
    return (1-alpha)*s0/4
def olr(t, tau=0.611):
    sigma = 5.67E-8
    return sigma * tau * t**4

def temperature_change_with_hysteresis(t0, n_years, tau=0.611):
    """Temperature change scenario after change of transmissivity.

    Parameters
    ----------
    t0 : float
        the starting temperature (K)
    n_years : int
        the number of simulation years
    tau : float, optional
        the atmosphere transmissivity (-)

    Returns
    -------
    (time, temperature) : ndarrays of size n_years + 1

    Examples
    --------
    >>> y, t = temperature_change_with_hysteresis(292, 50)
    >>> np.isclose(t[50], 288, atol=10e-04)
    True
    >>> y, t = temperature_change_with_hysteresis(265, 100)
    >>> np.isclose(t[100], 233, atol=10e-02)
    True    
    """
    
    dt = 60*60*24*365
    C = 4.0e+08
    years = np.arange(n_years + 1)
    temperature = np.zeros(n_years + 1)
    temperature[0] = t0
    for i in range(n_years):
        temperature[i + 1] = temperature[i] + dt / C * (asr(alpha=alpha_from_temperature(temperature[i])) - olr(temperature[i], tau=tau))
    return years, temperature


# Testing
import doctest
doctest.testmod()

**********************************************************************
File "__main__", line 23, in __main__.temperature_change_with_hysteresis
Failed example:
    np.isclose(t[100], 233, atol=10e-02)
Expected:
    True    
Got:
    True
**********************************************************************
1 items had failures:
   1 of   4 in __main__.temperature_change_with_hysteresis
***Test Failed*** 1 failures.

TestResults(failed=1, attempted=7)

a = np.linspace(206, 318, 29)
for i in a:
    plt.plot(*temperature_change_with_hysteresis(i, 50))
plt.title('Temperatur change with diffrent starting temperatures')
plt.xlabel('Years')
plt.ylabel('Temperature (K)')
plt.show()

No description has been provided for this image

02 - Weather station data files¶

I downloaded 10 min data from the recently launched ZAMG data hub. The data file contains selected parameters from the "INNSBRUCK-FLUGPLATZ (ID: 11804)" weather station.

You can download the data from the following links (right-click + "Save as..."):

station data
parameter metadata
station list from the ZAMG (in a better format than last time)

Let me open the data for you and display its content:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True)
df = df.drop('station', axis=1)

02-01: after reading the documentation of the respective functions (and maybe try a few things yourself), explain in plain sentences:

what am I asking pandas to do with the index_col=1, parse_dates=True keyword arguments? Why am I doing this?
what am I asking pandas to do with .drop()? Why axis=1?

Awnser: pd.read_csv is a pandas function. by default pandas puts a index starting with 0 to every dataframe. However it is useful to have the time as index. Therfore we use index_col=1 to spacify that we want colum 1 as index. If we dont use parse_dates=True the data type of the index is of type object the parameter parse_dates=True converts the object in a numpy data type that represents datetime64 values this is usful because it allows us to manipulate timespamps.

.drop() is a methode if we look up the documentation we can see:

Parameters

labels : single label or list-like Index or column labels to drop. A tuple will be used as a single label and not treated as a list-like. this means we the first parameter is used to spesify whiche colum should get droped

axis : {0 or 'index', 1 or 'columns'}, default 0 Whether to drop labels from the index (0 or 'index') or columns (1 or 'columns'). this means if we want to drop a cloum we have to write axis=1

however if we look further in the documetation we find:

index : single label or list-like Alternative to specifying axis (labels, axis=0 is equivalent to index=labels). columns : single label or list-like Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).

This means the following code does the same: df = df.drop(columns='station')

Now let me do something else from you:

dfmeta = pd.read_csv('ZEHNMIN Parameter-Metadaten.csv', index_col=0)
dfmeta.loc[df.columns]

	Kurzbeschreibung	Beschreibung	Einheit
DD	Windrichtung	Windrichtung, vektorielles Mittel über 10 Minuten	°
FF	vektorielle Windgeschwindigkeit	Windgeschwindigkeit, vektorielles Mittel über ...	m/s
GSX	Globalstrahlung	Globalstrahlung, arithmetisches Mittel über 10...	W/m²
P	Luftdruck	Luftdruck, Basiswert zur Minute10	hPa
RF	Relative Feuchte	Relative Luftfeuchte, Basiswert zur Minute10	%
RR	Niederschlag	10 Minuten Summe des Niederschlags, Summe der ...	mm
SO	Sonnenscheindauer	Sonnenscheindauer, Sekundensumme über 10 Minuten	s
TB1	Erdbodentemperatur in 10cm Tiefe	Erdbodentemperatur in 10cm Tiefe, Basiswert zu...	°C
TB2	Erdbodentemperatur in 20cm Tiefe	Erdbodentemperatur in 20cm Tiefe, Basiswert zu...	°C
TB3	Erdbodentemperatur in 50cm Tiefe	Erdbodentemperatur in 50cm Tiefe, Basiswert zu...	°C
TL	Lufttemperatur in 2m	Lufttemperatur in 2m Höhe, Basiswert zur Minute10	°C
TP	Taupunkt	Taupunktstemperatur, Basiswert zur Minute10	°C

02-02: again, explain in plain sentences what the dfmeta.loc[df.columns] is doing, and why it works that way.

.loc can be used to accses a group of rows and columns in this example we are passing in all the columns of the df i actually dont know exactli what its doing

Finally, let me do a last step for you before you start coding:

dfh = df.resample('H').mean()
dfh = dfh.drop(['RR','SO'], axis=1)
dfh['RR'] = df.iloc[:,5].resample('H').sum()
dfh['SO'] = df.iloc[:,6].resample('H').sum()

02-03: explore the dfh dataframe. Explain, in plain words, what the purpose of .resample('H') followed by mean() is. Explain what .resample('H').max() and .resample('H').sum() would do.

02-04: Using np.allclose, make sure that the average of the first hour (that you'll compute yourself from df) is indeed equal to the first row of dfh. Now, two variables in the dataframe have units that aren't suitable for averaging. Please convert the following variables to the correct units:

RR needs to be converted from the average of 10 min sums to mm/h
SO needs to be converted from the average of 10 min sums to s/h

a = dfh.iloc[0, 0:9]
b = df.iloc[0:6, [0,1,2,3,4,7,8,9,10]].mean()

for i in range(len(a)):
    print(f'{a.index[i]} is close to? {np.isclose(a[i], b[i])}')

DD is close to? True
FF is close to? True
GSX is close to? True
P is close to? True
RF is close to? True
TB1 is close to? True
TB2 is close to? True
TB3 is close to? True
TL is close to? True

From now on, we will use the hourly data only (and further aggregations when necessary). The 10 mins data are great but require a little bit more of pandas kung fu (the chinese term, not the sport) to be used efficiently.

Spend some time exploring the dfh dataframe we just created. What time period does it cover? What variables does it contain?

Note on pandas: all the exercises below can be done with or without pandas. Each question can be answered with very few lines of code (often one or two) with pandas, and I recommend to use it as much as possible. If you want, you can always use numpy in case of doubt: you can access the data as a numpy array with: df[column_name].values.

03 - Precipitation¶

In this section, we will focus on precipitation only.

03-01: Compute the average annual precipitation (m/year) over the 7-year period.

03-02: What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?

03-03: Plot a histogram of hourly precipitation, with bins of size 0.2 mm/h, starting at 0.1 mm/h and ending at 25 mm/h. Plot the same data, but this time with a logarithmic y-axis. Compute the 99th percentile (or quantile) of hourly precipitation.

03-04: Compute daily sums (mm/d) of precipitation (tip: use .resample again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).

03-05: Now select (subset) the daily dataframe to keep only only daily data in the months of December, January, February (DFJ). To do this, note that dfh.index.month exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).

03-06: Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.

03-07: Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA. I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour exists and can be used to categorize data. Then, note that df.groupby exists and can be used exactly for that (documentation).

a = dfh['RR'].sum()/7000
print(f'The average annual precipitation is: {a:.2} m\n')
a = dfh['RR'].max()
b = dfh['RR'].idxmax()
print(f'The max precipitation within an hour was: {a} mm\nThis occured on the {b}\n')
a = dfh['RR'].replace(0, np.NaN).min()
b = dfh[dfh['RR'] == a].index
print(f'The min non zero precipitation within an hour was: {a} mm\nThis occure {len(b)} times over the 7 year period')

The average annual precipitation is: 0.92 m

The max precipitation within an hour was: 22.2 mm
This occured on the 2021-09-16 22:00:00

The min non zero precipitation within an hour was: 0.1 mm
This occure 1653 times over the 7 year period

fig, ax = plt.subplots()

ax = dfh['RR'].replace(0, np.NaN).plot.hist(bins=50, log=True)

#ax.set_yscale('log')
plt.show()
pr_99 = dfh['RR'].replace(0, np.NaN).quantile(0.99)

a = df['RR'].resample('D').sum()
#summs up all the Trues
b = (a != 0).sum()/7
print(f'At averege there are {round(b, 0)} raindays')

At averege there are 170.0 raindays

winter_d = a[a.index.month.isin([12, 1, 2])]
b = winter_d.mean()
c = (winter_d != 0).sum()/7
print(f'The averege precipitation in the wintermonths per day is {b:.2} mm')
print(f'the average number of raindays in the wintermonths is: {c:.2}')

The averege precipitation in the wintermonths per day is 1.8 mm
the average number of raindays in the wintermonths is: 3.8e+01

summer_d = a[a.index.month.isin([6, 7, 8])]
b = summer_d.mean()
c = (summer_d != 0).sum()/7
print(f'The averege precipitation in the summermonths per day is {b} mm')
print(f'the average number of raindays in the summermonths is: {c}')

The averege precipitation in the summermonths per day is 4.024844720496894 mm
the average number of raindays in the summermonths is: 53.142857142857146

a = df['RR'].resample('H').sum()
winter_h = a[a.index.month.isin([12, 1, 2])]
b = (winter_h > pr_99).sum()
print(f'There are {b} hours with precipitation over the 99% tile')

There are 1 hours with precipitation over the 99% tile

a = df['RR'].resample('H').sum()
summer_h = a[a.index.month.isin([6, 7, 8])]
b = (summer_h > pr_99).sum()
print(f'There are {b} hours with precipitation over the 99% tile')

There are 52 hours with precipitation over the 99% tile

def averege_dail_cycle(df):
    return df.groupby([df.index.hour]).mean()

plt.plot(averege_dail_cycle(winter_h), label='daily winter cycel')
plt.plot(averege_dail_cycle(summer_h), label='daily summer cycel')
plt.xlabel('hours (00-24)')
plt.ylabel('Mean percipitation [mm/h]')
plt.title('Comparison of seasonal daily raincycles')
plt.legend()
plt.tight_layout
plt.show()

04 - A few other variables¶

In this section, we will continue to analyze the weather station data.

04-01: Verify that the three soil temperatures have approximately the same average value over the entire period. Now plot the three soil temperature timeseries in hourly resolution over the course of the year of 2020 (example). Repeat the plot with the month of may 2020.

04-02: Plot the average daily cycle of all three soil temperatures.

04-03: Compute the difference (in °C) between the air temperature and the dewpoint temperature. Now plot this difference on a scatter plot (x-axis: relative humidity, y-axis: temperature difference).

ground_temp = ['TB1', 'TB2', 'TB3']
mean_temp = []
for i in ground_temp:
    mean = dfh[i].mean()
    mean_temp.append(mean)
print(mean_temp)

mask = (dfh.index.year == 2020)
for i in ground_temp:
    plt.plot(dfh[i][mask], label=i)
    
plt.xlabel('Time (hours)')
plt.ylabel('Temperature')
plt.title('Comperison of soil Temperatures 10, 20, 50cm deep ')
plt.legend()
plt.tight_layout()
plt.show()

[11.393769492976766, 11.514406669059351, 11.379132772780602]

mask = (dfh.index.year == 2020) & (dfh.index.month == 5)
for i in ground_temp:
    plt.plot(dfh[i][mask], label=i)
    
plt.xlabel('Time (hours)')
plt.ylabel('Temperature')
plt.title('Comperison of soil Temperatures in May 2020, 10, 20, 50cm deep ')

plt.legend()
plt.tight_layout()
plt.show()

for i in ground_temp:
    plt.plot(averege_dail_cycle(dfh[i]), label=i)
plt.xlabel('Time (hours)')
plt.ylabel('Temperature')
plt.title('Mean daily cycle over 7 years for soil Temperatures 10, 20, 50cm deep')

plt.legend()
plt.tight_layout()
plt.show()

temp_diff = df['TL'] - df['TP']
plt.scatter(df['RF'],temp_diff, s=3)
plt.xlabel('relativ humidity (%)')
plt.ylabel('Temperaturedifference')
plt.title('Difference in air Temp to dewpoint Temp coralating to relativ humidity')
plt.tight_layout()
plt.show()

05 - Free coding project¶

The last part of this semester project is up to you! You are free to explore whatever interests you. I however add three requirements:

This section should have at least 5 original plots in it. They are the output of your analysis.
This section should also use additional data that you downloaded yourself. The easiest way would probably be to download another station(s) from the ZAMG database, or data from the same station but for another time period (e.g. for trend or change analysis). You can, however, decide to do something completely different if you prefer (as long as you download and read one more file).
this section should contain at least one regression or correlation analysis between two parameters. Examples:
- between two different variables at the same station (like we did with the dewpoint above)
- between different stations (for example, average temperature as a function of station elevation)
- between average temperature and time (trends analysis)
- etc.

That's it! Here are a few ideas:

detection of trends and changes at the station Innsbruck for 1993-2021
comparison of 5-yr climatologies at various stations in Tirol, taking elevation or location into account
compute the theoretical day length from the station's longitude and latitude (you can find solutions for this online, just let me know the source if you used a solution online), and use these computations to compare the measured sunshine duration to the maximum day length. This can be used to classify "sunshine days" for example.
use the python "windrose" library to plot a windrose at different locations and time of day.
etc.

If you have your own idea but are unsure about whether this is too much or not enough, come to see me in class! In general, the three requirements above should be enough.

My goal with this section is to let you formulate a programming goal and implement it.

df = pd.read_csv('STD Datensatz_19950101T0000_20221231T2300.csv', index_col=0, parse_dates=True)
#select timeframe where stations are offline
mask1 = (df['station'] == 15001) & (df.index >= '2007-08-31')
mask2 = (df['station'] == 15002) & (df.index < '2007-09-01')
mask = mask1 | mask2
# add those row that are not selected to a df
df = df[~mask]

def averege_dail_cycle(df):
    return df.groupby([df.index.hour]).mean()
def averege_seasonal_cycle(df):
    return df.groupby([df.index.month]).mean()
def analysis(df):
    df_min = df.min()
    df_max = df.max()
    df_mean = df.mean()
    std_dev = df.std()
    variance = df.var()
    data_range = df_max-df_min
    precentiles = df.quantile([0.25, 0.5, 0.75])
    # print(f'Min:\n{df_min}\n')
    # print(f'Max:\n{df_max}\n')
    # print(f'Mean:\n{df_mean}\n')
    # print(f'Standart Deviation:\n{std_dev}\n')
    # print(f'Variance:\n{variance}\n')
    # print(f'Datarange:\n{data_range}\n')
    # print(f'Precentiles:\n{precentiles}\n')
    return df_min, df_max, df_mean, std_dev, variance, data_range, precentiles
def rise_set(df, lat, long, timezone):
    
    loc = LocationInfo(lat, long, timezone)
    
    sunrise = []
    sunset = []

    for date in df.index:
        s = sun(loc.observer, date=date)
        sunrise.append(s['sunrise'])
        sunset.append(s['sunset'])
        
    df['sunrise'] = sunrise
    df['sunset'] = sunset
    return df
def split_day_night(df):
    day_data = df[df.index.time >= df['sunrise'].dt.time]
    day_data = day_data[day_data.index.time < day_data['sunset'].dt.time]
    night_data = df[df.index.time < df['sunrise'].dt.time]
    night_data = night_data[night_data.index.time >= night_data['sunset'].dt.time]
    return day_data, night_data
    

latitude = 	47.165563 
longitude = 11.862072  
timezone = "Europe"
df_rise_set = rise_set(df, latitude, longitude, timezone)
day, night = split_day_night(df_rise_set)

1	#boxplots grouped in 5 years with day and night temp

fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(20,15))

ax1 = ax[0]
ax2 = ax[1]

dic = {
    '1995-1999' : [1995, 1996, 1997, 1998, 1999],
    '2000-2004' : [2000, 2001, 2002, 2003, 2004],
    '2005-2009' : [2005, 2006, 2007, 2008, 2009],
    '2010-2014' : [2010, 2011, 2012, 2013, 2014],
    '2015-2019' : [2015, 2016, 2017, 2018, 2019]
}
#changes the dic, so that keys stay the same and values are df of specified years
for i, j in dic.items():
    dic[i] = df[df.index.year.isin(j)]
#uses key and value to plot averege daily cycle
for i, j in dic.items():
    ax1.plot(averege_dail_cycle(j['TTX']), label=i)
ax1.legend()
ax1.set_title('Daily Mean Temperature in 5 year blocks')
ax1.set_xlabel('hour')
ax1.set_ylabel('Temperature (°C)')


df_mean_5Y = df.resample('5Y').mean()
df_mean_Y = df.resample('Y').mean() 
ax2.plot(df_mean_5Y.index, df_mean_5Y['TTX'], label='Mean Temp. grouped by 5 Years')
ax2.plot(df_mean_Y.index, df_mean_Y['TTX'], label='Mean Temp. grouped by Years')
ax2.legend()
ax2.set_title('Average Temperature per years')
ax2.set_xlabel('Year')
ax2.set_ylabel('Temperature (°C)')

plt.tight_layout
plt.show()

fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(20,15))

ax1 = ax[0]
ax2 = ax[1]

for i, j in dic.items():
    ax1.plot(averege_dail_cycle(j['FFX']), label=i)
ax1.legend()
ax1.set_title('Daily Mean relativ humidity in 5 year blocks')
ax1.set_xlabel('hour')
ax1.set_ylabel('relativ humidity (%)')


df_mean_5Y = df.resample('5Y').mean()
df_mean_Y = df.resample('Y').mean() 
ax2.plot(df_mean_5Y.index, df_mean_5Y['FFX'], label='Mean Temp. grouped by 5 Years')
ax2.plot(df_mean_Y.index, df_mean_Y['FFX'], label='Mean Temp. grouped by Years')
ax2.legend()
ax2.set_title('Average relativ humidity per years')
ax2.set_xlabel('Year')
ax2.set_ylabel('Temperature (°C)')

plt.tight_layout
plt.show()

temp_dif = df['TTX'] - df['TDX']
plt.scatter(temp_dif, df['FFX'], s=1)
plt.xlabel('relativ humidity (%)')
plt.ylabel('Temperaturedifference')
plt.title('Difference in air Temp to dewpoint Temp coralating to relativ humidity')
plt.tight_layout()
plt.show()

#sum up sunshinedur and rain
df_sunshine = df['SUX'].resample('D').sum()
df_rain = df['RSX'].resample('D').sum()

plt.scatter(df_sunshine, df_rain, s=0.5)

plt.xlabel('Hours of sunshine')
plt.ylabel('acumulated amount of rain over a day')
plt.title('amount of rain compared to Sunshinduration')
plt.tight_layout()
plt.show()

#liearfit...

a = df['RSX'].sum()/26000
print(f'The average annual precipitation is: {a:.2} m\n')
a = df['RSX'].max()
b = df['RSX'].idxmax()
print(f'The max precipitation within an hour was: {a} mm\nThis occured on the {b}\n')
a = df['RSX'].replace(0, np.NaN).min()
b = df[df['RSX'] == a].index
print(f'The min non zero precipitation within an hour was: {a} mm\nThis occure {len(b)} times over the 26 year period')

The average annual precipitation is: 1.2 m

The max precipitation within an hour was: 39.9 mm
This occured on the 2017-08-05 16:00:00+00:00

The min non zero precipitation within an hour was: 0.1 mm
This occure 7986 times over the 26 year period