Programming project¶
- Names: Florian Schaeffer, Felix Kerschner
- Matrikelnummern: 11944702, 12022541
1 2 3 | import numpy as np import matplotlib.pyplot as plt import pandas as pd |
01 - An energy balance model with hysteresis¶
Based on the code we wrote in week 07, code the following extension. For this exercise it's OK to copy-paste my solutions and start from there. Only copy the necessary (no useless code)!
The planetary albedo $\alpha$ is in fact changing with climate change. As the temperature drops, sea-ice and ice sheets are extending (increasing the albedo). Inversely, the albedo decreases as temperature rises. The planetary albedo of our simple energy balance model follows the following equation:
$$ \alpha = \begin{cases} 0.3,& \text{if } T \gt 280\\ 0.7,& \text{if } T \lt 250\\ a T + b, & \text{otherwise} \end{cases} $$
Ecercise 1 - Solutions¶
01-01: Compute the parameters $a$ and $b$ so that the equation is continuous at T=250K and T=280K.
Equations to solve: (1) 250a + b = 0.7 (2) 280a + b = 0.3
1 2 | a = np.array([[250, 1], [280, 1]]) a |
array([[250, 1], [280, 1]])
1 2 3 4 5 | a_inv = np.linalg.inv(a) # Check that the inverse worked np.testing.assert_allclose(a @ a_inv, np.identity(2), atol=1e-7) np.round(a @ a_inv, decimals=7) |
array([[1., 0.], [0., 1.]])
1 2 | solution = a_inv @ [0.7, 0.3] solution |
array([-0.01333333, 4.03333333])
This means the result is: a = -0.01333333 b = 4.03333333
01-02: Now write a function called alpha_from_temperature
which accepts a single positional parameter T
as input (a scalar) and returns the corresponding albedo. Test your function using doctests to make sure that it complies to the instructions above.
Defining a function which derrives the albedo in the range of 250K to 280K. Any temperature below/above 250/280K results in a continous albedo of 0.7/0.3.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | def alpha_from_temperature(T): """ Testing values for (1) T<250K (2) T>280K (3) 250<T<280 Examples -------- >>> alpha_from_temperature(230) 0.7 >>> alpha_from_temperature(290) 0.3 >>> alpha_from_temperature(260) 0.5667 """ a = -0.01333333 b = 4.03333333 alpha = a*T+b if T < 250: return 0.7 elif T > 280: return 0.3 else: return round(alpha, 4) |
1 2 | import doctest doctest.testmod() |
TestResults(failed=0, attempted=3)
01-03: Adapt the existing code from week 07 to write a function called temperature_change_with_hysteresis
which accepts t0
(the starting temperature in K), n_years
(the number of simulation years) as positional arguments and tau
(the atmosphere transmissivity) as keyword argument (default value 0.611). Verify that:
- the stabilization temperature with
t0 = 292
and default tau is approximately 288K - the stabilization temperature with
t0 = 265
and default tau is approximately 233K
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | def asr(alpha): #Computes the absorbed solar radiation return (1 - alpha) * 340 def olr(T, tau): #Computes the outgoing longwave radiation return tau * 5.67e-8 * T ** 4 def temperature_change_with_hysteresis(t0, n_years, tau=0.611): time = np.arange(n_years + 1) temperature = np.zeros(n_years + 1) temperature[0] = t0 dt = 60*60*24*365 C = 4.0e+08 for i in range(n_years): alpha = alpha_from_temperature(temperature[i]) temperature[i + 1] = temperature[i] + dt/C * (asr(alpha) - olr(temperature[i], tau)) return time, temperature |
1 | temperature_change_with_hysteresis(292, 1000, tau=0.611) |
(array([ 0, 1, 2, ..., 998, 999, 1000]), array([292. , 290.90743873, 290.11039754, ..., 287.89768452, 287.89768452, 287.89768452]))
01-04: Realize a total of N simulations with starting temperatures regularly spaced between t0
=206K, and t0
=318K and plot them on a single plot for n_years
=50. The plot should look somewhat similar to this example for N=21
.
1 2 3 4 5 6 7 8 | N = 21 simulations = np.linspace(206, 318, N) for i in simulations: plt.plot(temperature_change_with_hysteresis(i, 50) [0], temperature_change_with_hysteresis(i, 50) [1]) plt.title('Climate change scenario with hysteresis') # Add a title to the plot plt.xlabel('years') plt.ylabel('temperature (K)') |
Text(0, 0.5, 'temperature (K)')
02 - Weather station data files¶
I downloaded 10 min data from the recently launched ZAMG data hub. The data file contains selected parameters from the "INNSBRUCK-FLUGPLATZ (ID: 11804)" weather station.
You can download the data from the following links (right-click + "Save as..."):
- station data
- parameter metadata
- station list from the ZAMG (in a better format than last time)
Let me open the data for you and display its content:
1 2 3 | import pandas as pd import numpy as np import matplotlib.pyplot as plt |
1 2 3 | df = pd.read_csv('data/INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True) df = df.drop('station', axis=1) display(df) |
DD | FF | GSX | P | RF | RR | SO | TB1 | TB2 | TB3 | TL | TP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
time | ||||||||||||
2015-01-01 00:00:00 | 232.0 | 1.0 | 0.0 | 965.7 | 96.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.7 | -3.4 |
2015-01-01 00:10:00 | 248.0 | 1.6 | 0.0 | 965.6 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.4 | -2.9 |
2015-01-01 00:20:00 | 252.0 | 1.6 | 0.0 | 965.6 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.4 | -2.9 |
2015-01-01 00:30:00 | 267.0 | 1.3 | 0.0 | 965.7 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.7 | -3.2 |
2015-01-01 00:40:00 | 279.0 | 1.4 | 0.0 | 965.7 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.5 | -3.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-12-31 23:10:00 | 257.0 | 1.1 | 0.0 | 959.8 | 98.0 | 0.0 | 0.0 | 3.1 | 3.4 | 3.4 | -1.7 | -1.7 |
2021-12-31 23:20:00 | 106.0 | 0.3 | 0.0 | 960.0 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -2.0 | -2.0 |
2021-12-31 23:30:00 | 141.0 | 0.5 | 0.0 | 960.2 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -1.7 | -1.7 |
2021-12-31 23:40:00 | 167.0 | 0.5 | 0.0 | 960.2 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -1.6 | -1.6 |
2021-12-31 23:50:00 | 275.0 | 0.3 | 0.0 | 960.1 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -1.6 | -1.7 |
368208 rows × 12 columns
Exercise 2 - Solutions¶
02-01: after reading the documentation of the respective functions (and maybe try a few things yourself), explain in plain sentences:
- what am I asking pandas to do with the
index_col=1, parse_dates=True
keyword arguments? Why am I doing this? - what am I asking pandas to do with
.drop()
? Whyaxis=1
?
02-01 Solutions
index_col=1 : takes the second column of the .csv file and makes it the index column (which is date and time in this case) .
parse_dates=True : automatially analyses the second column as a date-time format column.
.drop() : removes the first column called "stations" from the .csv file.
axis=1 : makes sure to delete the 'stations' column and not the row (which would be axis=2).
Now let me do something else from you:
1 2 | dfmeta = pd.read_csv('data/ZEHNMIN Parameter-Metadaten.csv', index_col=0) dfmeta.loc[df.columns] |
Kurzbeschreibung | Beschreibung | Einheit | |
---|---|---|---|
DD | Windrichtung | Windrichtung, vektorielles Mittel über 10 Minuten | ° |
FF | vektorielle Windgeschwindigkeit | Windgeschwindigkeit, vektorielles Mittel über ... | m/s |
GSX | Globalstrahlung | Globalstrahlung, arithmetisches Mittel über 10... | W/m² |
P | Luftdruck | Luftdruck, Basiswert zur Minute10 | hPa |
RF | Relative Feuchte | Relative Luftfeuchte, Basiswert zur Minute10 | % |
RR | Niederschlag | 10 Minuten Summe des Niederschlags, Summe der ... | mm |
SO | Sonnenscheindauer | Sonnenscheindauer, Sekundensumme über 10 Minuten | s |
TB1 | Erdbodentemperatur in 10cm Tiefe | Erdbodentemperatur in 10cm Tiefe, Basiswert zu... | °C |
TB2 | Erdbodentemperatur in 20cm Tiefe | Erdbodentemperatur in 20cm Tiefe, Basiswert zu... | °C |
TB3 | Erdbodentemperatur in 50cm Tiefe | Erdbodentemperatur in 50cm Tiefe, Basiswert zu... | °C |
TL | Lufttemperatur in 2m | Lufttemperatur in 2m Höhe, Basiswert zur Minute10 | °C |
TP | Taupunkt | Taupunktstemperatur, Basiswert zur Minute10 | °C |
02-02: again, explain in plain sentences what the dfmeta.loc[df.columns]
is doing, and why it works that way.
02-02 Solutions
.loc[] : locates and returns the columns/rows given in the brackets. In our case (df.meta.loc[df.columns]), all columns are returned.
Finally, let me do a last step for you before you start coding:
1 2 | dfh = df.resample('H').mean() display(dfh) |
DD | FF | GSX | P | RF | RR | SO | TB1 | TB2 | TB3 | TL | TP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
time | ||||||||||||
2015-01-01 00:00:00 | 257.333333 | 1.383333 | 0.0 | 965.666667 | 97.000000 | 0.0 | 0.0 | 3.200000 | 3.600000 | 5.2 | -2.533333 | -3.066667 |
2015-01-01 01:00:00 | 160.666667 | 0.766667 | 0.0 | 965.966667 | 97.666667 | 0.0 | 0.0 | 3.150000 | 3.600000 | 5.2 | -3.250000 | -3.716667 |
2015-01-01 02:00:00 | 227.000000 | 0.816667 | 0.0 | 966.316667 | 98.500000 | 0.0 | 0.0 | 3.100000 | 3.600000 | 5.2 | -3.583333 | -3.933333 |
2015-01-01 03:00:00 | 258.833333 | 1.933333 | 0.0 | 966.466667 | 97.000000 | 0.0 | 0.0 | 3.100000 | 3.600000 | 5.2 | -3.250000 | -3.766667 |
2015-01-01 04:00:00 | 241.500000 | 1.266667 | 0.0 | 966.416667 | 95.166667 | 0.0 | 0.0 | 3.100000 | 3.600000 | 5.2 | -3.116667 | -3.883333 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-12-31 19:00:00 | 283.500000 | 1.166667 | 0.0 | 957.766667 | 98.833333 | 0.0 | 0.0 | 3.316667 | 3.300000 | 3.4 | 0.900000 | 0.716667 |
2021-12-31 20:00:00 | 237.333333 | 0.716667 | 0.0 | 958.100000 | 99.500000 | 0.0 | 0.0 | 3.283333 | 3.300000 | 3.4 | -0.200000 | -0.350000 |
2021-12-31 21:00:00 | 149.833333 | 1.083333 | 0.0 | 958.783333 | 99.666667 | 0.0 | 0.0 | 3.200000 | 3.300000 | 3.4 | -0.800000 | -0.916667 |
2021-12-31 22:00:00 | 175.833333 | 0.516667 | 0.0 | 959.433333 | 99.166667 | 0.0 | 0.0 | 3.100000 | 3.383333 | 3.4 | -1.083333 | -1.200000 |
2021-12-31 23:00:00 | 198.833333 | 0.600000 | 0.0 | 960.016667 | 99.500000 | 0.0 | 0.0 | 3.033333 | 3.400000 | 3.4 | -1.683333 | -1.716667 |
61368 rows × 12 columns
1 2 3 | #resampled dfh to get sum columns dfh_sum = df.resample('H').sum() display(dfh_sum) |
DD | FF | GSX | P | RF | RR | SO | TB1 | TB2 | TB3 | TL | TP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
time | ||||||||||||
2015-01-01 00:00:00 | 1544.0 | 8.3 | 0.0 | 5794.0 | 582.0 | 0.0 | 0.0 | 19.2 | 21.6 | 31.2 | -15.2 | -18.4 |
2015-01-01 01:00:00 | 964.0 | 4.6 | 0.0 | 5795.8 | 586.0 | 0.0 | 0.0 | 18.9 | 21.6 | 31.2 | -19.5 | -22.3 |
2015-01-01 02:00:00 | 1362.0 | 4.9 | 0.0 | 5797.9 | 591.0 | 0.0 | 0.0 | 18.6 | 21.6 | 31.2 | -21.5 | -23.6 |
2015-01-01 03:00:00 | 1553.0 | 11.6 | 0.0 | 5798.8 | 582.0 | 0.0 | 0.0 | 18.6 | 21.6 | 31.2 | -19.5 | -22.6 |
2015-01-01 04:00:00 | 1449.0 | 7.6 | 0.0 | 5798.5 | 571.0 | 0.0 | 0.0 | 18.6 | 21.6 | 31.2 | -18.7 | -23.3 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-12-31 19:00:00 | 1701.0 | 7.0 | 0.0 | 5746.6 | 593.0 | 0.0 | 0.0 | 19.9 | 19.8 | 20.4 | 5.4 | 4.3 |
2021-12-31 20:00:00 | 1424.0 | 4.3 | 0.0 | 5748.6 | 597.0 | 0.0 | 0.0 | 19.7 | 19.8 | 20.4 | -1.2 | -2.1 |
2021-12-31 21:00:00 | 899.0 | 6.5 | 0.0 | 5752.7 | 598.0 | 0.0 | 0.0 | 19.2 | 19.8 | 20.4 | -4.8 | -5.5 |
2021-12-31 22:00:00 | 1055.0 | 3.1 | 0.0 | 5756.6 | 595.0 | 0.0 | 0.0 | 18.6 | 20.3 | 20.4 | -6.5 | -7.2 |
2021-12-31 23:00:00 | 1193.0 | 3.6 | 0.0 | 5760.1 | 597.0 | 0.0 | 0.0 | 18.2 | 20.4 | 20.4 | -10.1 | -10.3 |
61368 rows × 12 columns
02-03: explore the dfh
dataframe. Explain, in plain words, what the purpose of .resample('H')
followed by mean()
is. Explain what .resample('H').max()
and .resample('H').sum()
would do.
02-03 Solutions
.resample('H').mean() : looks at the date-time column and creates an and means (.mean()) all measurements taken in one hour ('H'= hour)
.resample('H').max() : displays each columns highest number measured per hour.
.resample('H').sum() : sums up all measurements in a column taken in one hour.
02-04: Using np.allclose
, make sure that the average of the first hour (that you'll compute yourself from df
) is indeed equal to the first row of dfh
. Now, two variables in the dataframe have units that aren't suitable for averaging. Please convert the following variables to the correct units:
RR
needs to be converted from the average of 10 min sums to mm/hSO
needs to be converted from the average of 10 min sums to s/h
From now on, we will use the hourly data only (and further aggregations when necessary). The 10 mins data are great but require a little bit more of pandas kung fu (the chinese term, not the sport) to be used efficiently.
Spend some time exploring the dfh
dataframe we just created. What time period does it cover? What variables does it contain?
Note on pandas: all the exercises below can be done with or without pandas. Each question can be answered with very few lines of code (often one or two) with pandas, and I recommend to use it as much as possible. If you want, you can always use numpy in case of doubt: you can access the data as a numpy array with: df[column_name].values
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | #picking df data for first hour and converting the RR and SO columns first_hour = df.iloc[0:6, :] first_hour_mean = first_hour.mean() sum1 = first_hour.sum() first_hour_mean['RR'] = sum1['RR'] first_hour_mean['SO'] = sum1['SO'] #conversion dfh dfh['RR'] = dfh_sum['RR'] dfh['SO'] = dfh_sum['SO'] #comparison first hour dh with first hour dfh hour1_dfh = dfh.iloc[0] compare = np.allclose(first_hour_mean, hour1_dfh) print(compare) print(hour1_dfh) print(first_hour_mean) |
True DD 257.333333 FF 1.383333 GSX 0.000000 P 965.666667 RF 97.000000 RR 0.000000 SO 0.000000 TB1 3.200000 TB2 3.600000 TB3 5.200000 TL -2.533333 TP -3.066667 Name: 2015-01-01 00:00:00, dtype: float64 DD 257.333333 FF 1.383333 GSX 0.000000 P 965.666667 RF 97.000000 RR 0.000000 SO 0.000000 TB1 3.200000 TB2 3.600000 TB3 5.200000 TL -2.533333 TP -3.066667 dtype: float64
03 - Precipitation¶
In this section, we will focus on precipitation only.
03-01: Compute the average annual precipitation (m/year) over the 7-year period.
03-02: What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?
03-03: Plot a histogram of hourly precipitation, with bins of size 0.2 mm/h, starting at 0.1 mm/h and ending at 25 mm/h. Plot the same data, but this time with a logarithmic y-axis. Compute the 99th percentile (or quantile) of hourly precipitation.
03-04: Compute daily sums (mm/d) of precipitation (tip: use .resample
again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).
03-05: Now select (subset) the daily dataframe to keep only only daily data in the months of December, January, February (DFJ). To do this, note that dfh.index.month
exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).
03-06: Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.
03-07: Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA. I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour
exists and can be used to categorize data. Then, note that df.groupby
exists and can be used exactly for that (documentation).
03 Solutions¶
03-01: Compute the average annual precipitation (m/year) over the 7-year period.
1 2 3 4 5 6 | dfh_yearly = dfh.resample('Y').sum() dfh_RR = dfh_yearly.iloc[:,5:6] RR_total = dfh_RR.sum() display(dfh_RR) print('Average precipitation per year (m/year):', RR_total/7/1000) |
RR | |
---|---|
time | |
2015-12-31 | 852.4 |
2016-12-31 | 911.8 |
2017-12-31 | 1026.5 |
2018-12-31 | 783.0 |
2019-12-31 | 1037.3 |
2020-12-31 | 920.7 |
2021-12-31 | 912.2 |
Average precipitation per year (m/year): RR 0.920557 dtype: float64
03-02: What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | minValue = dfh[dfh.gt(0)].min() minValueRR = minValue.iloc[5] min_time = dfh.loc[dfh['RR'] == minValueRR] min_time = min_time.round(2) min_time_only = min_time.iloc[:, 5] maxValue = dfh.max() maxValueRR = maxValue.iloc[5] max_time = dfh.loc[dfh['RR'] == maxValueRR] max_time = max_time.round(2) max_time_only = max_time.iloc[:, 5] print('Min. hourly precipitation:', minValueRR, 'mm/hr') print(min_time_only) print('Max. hourly precipitation:', maxValueRR, 'mm/hr') print(max_time_only) |
Min. hourly precipitation: 0.1 mm/hr time 2015-01-02 16:00:00 0.1 2015-01-11 10:00:00 0.1 2015-01-14 16:00:00 0.1 2015-01-18 04:00:00 0.1 2015-01-24 04:00:00 0.1 ... 2021-12-13 05:00:00 0.1 2021-12-13 06:00:00 0.1 2021-12-13 07:00:00 0.1 2021-12-30 09:00:00 0.1 2021-12-30 11:00:00 0.1 Name: RR, Length: 1653, dtype: float64 Max. hourly precipitation: 22.2 mm/hr time 2021-09-16 22:00:00 22.2 Freq: H, Name: RR, dtype: float64
The min. Precipitation is 0.1 mm/hr and was measured for 1653 hours. The max. precipitation was 22.2 mm/hr happening on september 16th 2021
03-03: Plot a histogram of hourly precipitation, with bins of size 0.2 mm/h, starting at 0.1 mm/h and ending at 25 mm/h. Plot the same data, but this time with a logarithmic y-axis. Compute the 99th percentile (or quantile) of hourly precipitation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | #percentile calculation quantile99 = dfh['RR'].quantile(0.99) print('99th quantile:', quantile99) #removing zeros from dfh's 'RR' column RR_no_zeros = dfh['RR'] RR_no_zeros = RR_no_zeros[RR_no_zeros>0] print (RR_no_zeros) #number of bins bins_nr = int(round(((25 - 0.1)/0.2), 0)) print(bins_nr) #plot hitsogramm plt.hist(RR_no_zeros, bins = bins_nr) plt.title('Histogram of precipitation occurences') plt.xlabel('Precipitation in mm/h') plt.ylabel('Occurences') |
99th quantile: 2.3330000000001747 time 2015-01-02 16:00:00 0.1 2015-01-02 17:00:00 0.2 2015-01-02 18:00:00 0.3 2015-01-03 07:00:00 0.2 2015-01-03 15:00:00 0.5 ... 2021-12-30 01:00:00 0.5 2021-12-30 07:00:00 0.4 2021-12-30 08:00:00 0.5 2021-12-30 09:00:00 0.1 2021-12-30 11:00:00 0.1 Name: RR, Length: 7191, dtype: float64 124
Text(0, 0.5, 'Occurences')
1 2 3 4 5 6 | #plot histogram logarithmic plt.hist(RR_no_zeros, bins = bins_nr) plt.yscale('log') plt.title('Histogram of precipitation occurences') plt.xlabel('Precipitation in mm/h') plt.ylabel('Occurences (log)') |
Text(0, 0.5, 'Occurences (log)')
03-04: Compute daily sums (mm/d) of precipitation (tip: use .resample
again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).
1 2 3 4 5 6 7 8 9 | #resampling dfh to daily data. Since only RR is needed, we ignore that the other columns are sums instead of means. dfh_daily = dfh.resample('D').sum() display(dfh_daily['RR']) #rain days rain_d_only = dfh_daily['RR'] rain_d_only = rain_d_only[rain_d_only>0] #rain_d_average = dfh_daily.len() rain_d_average = len(rain_d_only) print('Average rainy days per year in Innsbruck:', round((rain_d_average/7),2)) |
time 2015-01-01 0.0 2015-01-02 0.6 2015-01-03 12.2 2015-01-04 7.7 2015-01-05 0.0 ... 2021-12-27 0.0 2021-12-28 0.0 2021-12-29 15.1 2021-12-30 2.1 2021-12-31 0.0 Freq: D, Name: RR, Length: 2557, dtype: float64
Average rainy days per year in Innsbruck: 170.29
03-05: Now select (subset) the daily dataframe to keep only only daily data in the months of December, January, February (DFJ). To do this, note that dfh.index.month
exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #for DJF DJF = dfh_daily.loc[(dfh_daily.index.month.isin([12, 1, 2]))] DJF_average = round((DJF['RR'].sum()/632), 2) print('Average precipitaion for a day in December, January and February (Innsbruck) in mm/d:', DJF_average) DJF_d_rainy = DJF['RR'] DJF_d_rainy = DJF_d_rainy[DJF_d_rainy>0] print('Average count of rainy days in December, January and February (Innsbruck)', round(((len(DJF_d_rainy))/7), 2)) #for JJA JJA = dfh_daily.loc[(dfh_daily.index.month.isin([6, 7, 8]))] JJA_average = round((JJA['RR'].sum()/632), 2) print('Average precipitaion for a day in June, July and August (Innsbruck) in mm/d:', JJA_average) JJA_d_rainy = JJA['RR'] JJA_d_rainy = JJA_d_rainy[JJA_d_rainy>0] print('Average count of rainy days in June, July and August (Innsbruck)', round(((len(JJA_d_rainy))/7), 2)) |
Average precipitaion for a day in December, January and February (Innsbruck) in mm/d: 1.77 Average count of rainy days in December, January and February (Innsbruck) 37.57 Average precipitaion for a day in June, July and August (Innsbruck) in mm/d: 4.1 Average count of rainy days in June, July and August (Innsbruck) 53.14
03-06: Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | #for DJF DJF_h = dfh.loc[(dfh.index.month.isin([12, 1, 2]))] DJF_h_RR = DJF_h['RR'] DJF_h_average = round((DJF_h['RR'].sum()/len(DJF_h)), 2) print('Average precipitaion per hour in December, January and February (Innsbruck) in mm/d:', DJF_h_average) DJF_h_rainy = DJF_h['RR'] DJF_h_rainy = DJF_h_rainy[DJF_h_rainy>0] print('Average count of rainy hours in December, January and February (Innsbruck)', round(((len(DJF_h_rainy))/7), 2)) DJF_99 = DJF_h_RR[DJF_h_RR>quantile99].count() print('Above 99th percentile:', DJF_99) #for JJA JJA_h = dfh.loc[(dfh.index.month.isin([6, 7, 8]))] JJA_h_RR = JJA_h['RR'] JJA_h_average = round((JJA_h['RR'].sum()/len(JJA_h)), 2) print('Average precipitaion for an hour in June, July and August (Innsbruck) in mm/d:', JJA_h_average) JJA_h_rainy = JJA_h['RR'] JJA_h_rainy = JJA_h_rainy[JJA_h_rainy>0] print('Average count of rainy hours in June, July and August (Innsbruck)', round(((len(JJA_h_rainy))/7), 2)) JJA_99 = JJA_h_RR[JJA_h_RR>quantile99].count() print('Above 99th percentile:', JJA_99) |
Average precipitaion per hour in December, January and February (Innsbruck) in mm/d: 0.07 Average count of rainy hours in December, January and February (Innsbruck) 242.57 Above 99th percentile: 68 Average precipitaion for an hour in June, July and August (Innsbruck) in mm/d: 0.17 Average count of rainy hours in June, July and August (Innsbruck) 288.57 Above 99th percentile: 308
03-07: Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA. I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour
exists and can be used to categorize data. Then, note that df.groupby
exists and can be used exactly for that (documentation).
1 2 3 4 5 6 | winter = plt.plot(DJF_h_RR.groupby(DJF_h_RR.index.hour).sum(), label = 'winter (DJF)') summer = plt.plot(JJA_h_RR.groupby(JJA_h_RR.index.hour).sum(), label = 'summer (JJA)') plt.title('Average daily precipitation in summer and winter') # Add a title to the plot plt.xlabel('time of day') plt.ylabel('mm/h') plt.legend() |
<matplotlib.legend.Legend at 0x1e88ccf9c00>
04 - A few other variables¶
04-01: Verify that the three soil temperatures have approximately the same average value over the entire period. Now plot the three soil temperature timeseries in hourly resolution over the course of the year of 2020 (example). Repeat the plot with the month of may 2020.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | TB1 = df['TB1'].resample('H').mean() TB2 = df['TB2'].resample('H').mean() TB3 = df['TB3'].resample('H').mean() print(TB1.mean()) print(TB2.mean()) print(TB3.mean()) rtol = 0.1 atol = 0.2 compare_soil = np.allclose(TB1.mean(), TB2.mean(), TB3.mean(), rtol, atol) print(compare_soil) TB1_2020 = TB1.loc[(dfh.index.year.isin([2020]))] TB2_2020 = TB2.loc[(dfh.index.year.isin([2020]))] TB3_2020 = TB3.loc[(dfh.index.year.isin([2020]))] #plot 2020 plt.plot(TB1_2020, label = 'TB1') plt.plot(TB2_2020, label = 'TB2') plt.plot(TB3_2020, label = 'TB3') plt.title('Soil temperatures in 2020') # Add a title to the plot plt.xlabel('Month') plt.ylabel('Temperature (°C)') plt.legend() |
11.393769492976766 11.514406669059351 11.379132772780602 True
<matplotlib.legend.Legend at 0x1e88bc48bb0>
1 2 3 4 5 6 7 8 9 10 11 12 13 | #plot may 2020 TB1_2020_may = TB1_2020.loc[(TB1_2020.index.month.isin([5]))] TB2_2020_may = TB2_2020.loc[(TB2_2020.index.month.isin([5]))] TB3_2020_may = TB3_2020.loc[(TB3_2020.index.month.isin([5]))] plt.plot(TB1_2020_may, label = 'TB1') plt.plot(TB2_2020_may, label = 'TB2') plt.plot(TB3_2020_may, label = 'TB3') plt.title('Soil temperatures in May 2020') # Add a title to the plot plt.xlabel('Day') plt.ylabel('Temperature (°C)') plt.xticks(rotation=45) plt.legend() |
<matplotlib.legend.Legend at 0x1e88cb91f60>
04-02: Plot the average daily cycle of all three soil temperatures.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #creating the columns needed for the plot and therefore calculating mean of hourly sampled data TB1_cycle= dfh['TB1'].groupby(dfh['TB1'].index.hour).mean() TB2_cycle= dfh['TB2'].groupby(dfh['TB2'].index.hour).mean() TB3_cycle= dfh['TB3'].groupby(dfh['TB3'].index.hour).mean() #creating the 3 plots plt.plot(TB1_cycle, label='TB1') plt.plot(TB2_cycle, label='TB2') plt.plot(TB3_cycle, label='TB3') plt.title('Daily cycle of soil temps') plt.xlabel('daytime') plt.ylabel('Soil temp (°C)') #creating manual 3h ticks ticks_array = np.arange(0, 25, 3) plt.xticks(ticks=ticks_array) plt.legend() |
<matplotlib.legend.Legend at 0x1e88dfbf3d0>
04-03: Compute the difference (in °C) between the air temperature and the dewpoint temperature. Now plot this difference on a scatter plot (x-axis: relative humidity, y-axis: temperature difference).
1 2 3 4 5 6 7 8 9 10 | temp_diff = dfh temp_diff['temp_diff'] = dfh['TL'] - dfh['TP'] temp_diff = temp_diff['temp_diff'] hum = dfh['RF'] #creating the plot plt.scatter(hum, temp_diff, color = 'fuchsia', s = 0.1) plt.title('Scatter plot Air-dewpoint-temperature and relative humidty') plt.xlabel('Humidity (%)') plt.ylabel('Temperature-difference (°C)') |
Text(0, 0.5, 'Temperature-difference (°C)')
05 - Free coding project¶
The last part of this semester project is up to you! You are free to explore whatever interests you. I however add three requirements:
- This section should have at least 5 original plots in it. They are the output of your analysis.
- This section should also use additional data that you downloaded yourself. The easiest way would probably be to download another station(s) from the ZAMG database, or data from the same station but for another time period (e.g. for trend or change analysis). You can, however, decide to do something completely different if you prefer (as long as you download and read one more file).
- this section should contain at least one regression or correlation analysis between two parameters. Examples:
- between two different variables at the same station (like we did with the dewpoint above)
- between different stations (for example, average temperature as a function of station elevation)
- between average temperature and time (trends analysis)
- etc.
That's it! Here are a few ideas:
- detection of trends and changes at the station Innsbruck for 1993-2021
- comparison of 5-yr climatologies at various stations in Tirol, taking elevation or location into account
- compute the theoretical day length from the station's longitude and latitude (you can find solutions for this online, just let me know the source if you used a solution online), and use these computations to compare the measured sunshine duration to the maximum day length. This can be used to classify "sunshine days" for example.
- use the python "windrose" library to plot a windrose at different locations and time of day.
- etc.
If you have your own idea but are unsure about whether this is too much or not enough, come to see me in class! In general, the three requirements above should be enough.
My goal with this section is to let you formulate a programming goal and implement it.
Free Coding project: Paleo-climate analysis on speleothems¶
Info for our project
In the following, the isotopic signature of a speleothem from Verijovkina cave is used to create a paleoclimate reconstruction which is then compared to European data regarding the same time period. Therefore, the isotopic data of oxygen and carbon are plotted to gain an overview of the data. Furthermore, it is needed to gain information about what parameters control the isotopic signature (this study mainly focusses on the oxygen isotopes), which is why data of two weather station in Russia and Georgia is used to gain information about a possible correlation between the oxygen isotopic value and temperature. In Addition to that, the isotopic values of oxygen and carbon are plotted against each other to check for kinetic fractionation. Lastly the gained climate reconstructions (regarding temperature and aridity) are compared to temperature anomalies in Europe during the same time period.
1 2 3 4 5 6 | import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.linear_model import LinearRegression from matplotlib.pyplot import figure from matplotlib.patches import Polygon |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | #reading data and preparing variables #cave data V4 = pd.read_csv('data/isotopes_V4.csv', index_col=0) d13C = V4.drop('del18O', axis=1) d18O = V4.drop('del13C', axis=1) #correlation 1 cor = pd.read_csv('data/18O_temp.csv', index_col=0) corx = cor[' AirTemp.'] cory = cor.index corx_re = np.array(corx).reshape((-1, 1)) #correlation 2 cor2 = pd.read_csv('data/18O_temp2.csv', index_col=0) corx2 = cor2[' AirTemp.'] cory2 = cor2.index corx2_re = np.array(corx2).reshape((-1, 1)) #correlation 3 corx3 = V4['del13C'] corx3 = corx3.dropna() cory3 = d18O cory3 = cory3.dropna() corx3_re = np.array(corx3).reshape((-1, 1)) #additional treering data for plot 5 tree = pd.read_csv('data/europeTemp.csv', index_col=0) tree = tree.drop('weg1', axis=1) tree = tree.drop('weg2', axis=1) tree = tree.drop('weg3', axis=1) treex = tree.index treey = tree['TempAnoms'] |
1 2 3 4 | #descriptive statistics plot 1 d18O_mean = round((d18O.mean()),2) d18O_min = d18O.min() d18O_max = d18O.max() |
1 2 3 4 5 6 7 8 9 10 11 12 | #plot d18O f = plt.figure() f.set_figwidth(10) f.set_figheight(2) plt.plot(d18O.index, d18O['del18O'], linewidth=0.5) plt.title('\u03B418O-isotope data from V4 cave') # Add a title to the plot plt.xlabel('years BP') plt.ylabel('\u03B418O (‰)') plt.figtext(0.82, 0.13, f" Min: {d18O_min[0]} \n Max: {d18O_max[0]} \n Mean: {d18O_mean[0]}", fontsize = 8) plt.figtext(0.1, -0.2, 'Fig 1: shows the measured values for \u03B418O in ‰ plotted against the age in years BP ') |
Text(0.1, -0.2, 'Fig 1: shows the measured values for δ18O in ‰ plotted against the age in years BP ')
1 2 3 4 | #descriptive statistics plot 2 d13C_mean = round((d13C.mean()),2) d13C_min = d13C.min() d13C_max = d13C.max() |
1 2 3 4 5 6 7 8 9 10 11 12 | #plot d13C f = plt.figure() f.set_figwidth(10) f.set_figheight(2) plt.plot(d13C.index, d13C['del13C'], linewidth=0.5, color="m") plt.title('\u03B413C-isotope data from V4 cave') # Add a title to the plot plt.xlabel('years BP') plt.ylabel('\u03B413C (‰)') plt.figtext(0.82, 0.13, f" Min: {d13C_min[0]} \n Max: {d13C_max[0]} \n Mean: {d13C_mean[0]}", fontsize = 8) plt.figtext(0.1, -0.2, 'Fig 2: shows the measured values for \u03B413C in ‰ plotted against the age in years BP ') |
Text(0.1, -0.2, 'Fig 2: shows the measured values for δ13C in ‰ plotted against the age in years BP ')
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | # model fitting and regression calculation using scikit-package #model 1 model = LinearRegression() model.fit(corx_re, cory) #model 2 model_2 = LinearRegression() model_2.fit(corx2_re, cory2) #model 3 model_3 = LinearRegression() model_3.fit(corx3_re, cory3) #regression 1 r_sq = round((model.score(corx_re, cory)), 2) r_sq = str(r_sq) #regression 2 r_sq_2 = round((model_2.score(corx2_re, cory2)), 2) r_sq_2 = str(r_sq_2) #regression 3 r_sq_3 = round((model_3.score(corx3_re, cory3)), 2) r_sq_3 = str(r_sq_3) print("coefficient of determination d18O_1:", r_sq) print("coefficient of determination d18O_2:", r_sq_2) print("coefficient of determination d18O-d13C:", r_sq_3) |
coefficient of determination d18O_1: 0.78 coefficient of determination d18O_2: 0.91 coefficient of determination d18O-d13C: 0.44
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | #plot correlation #set size of plot f = plt.figure() f.set_figwidth(11) f.set_figheight(3) #subplot 1 scatter plot plt.subplot(1, 2, 1) plt.scatter(corx, cory, linewidth=0.5, color="hotpink") plt.title('Correlation \u03B418O and Temperature (Georgia)') # Add a title to the plot plt.xlabel('Air temp. (°C)') plt.ylabel('\u03B418O (‰)') #show regression R^2 plt.figtext(0.15, 0.75, '$R^{2}$ =' + r_sq) #correlation line m, b = np.polyfit(corx, cory, 1) plt.plot(corx, m*corx+b, linewidth=0.5, color="skyblue") #subplot 2 scatter plt.subplot(1, 2, 2) plt.scatter(corx2, cory2, linewidth=0.5, color="yellowgreen") plt.title('Correlation \u03B418O and Temperature (Russia)') # Add a title to the plot plt.xlabel('Air temp. (°C)') plt.ylabel('\u03B418O (‰)') #show regression R^2 plt.figtext(0.57, 0.75, '$R^{2}$ =' + r_sq_2) #add figure description text plt.figtext(0.1, -0.17, 'Fig 3: Correlation between temperature and \u03B418O values from two weather stations in Georgia and Russia. Here, the R2 values \nsuggest the temperature to be the controlling parameter for \u03B418O values. ') #correlation line m, b = np.polyfit(corx2, cory2, 1) plt.plot(corx2, m*corx2+b, linewidth=1, color="skyblue") |
[<matplotlib.lines.Line2D at 0x1e890c2b250>]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | #plot correlation #set size of plot f = plt.figure() f.set_figwidth(11) f.set_figheight(3) #subplot 1 scatter plot plt.subplot(1, 2, 1) plt.scatter(corx3_re, cory3, linewidth=0.5, color="slateblue", s=20) plt.title('Correlation \u03B418O and \u03B413C') # Add a title to the plot plt.xlabel('\u03B413C (‰)') plt.ylabel('\u03B418O (‰)') #show regression R^2 plt.figtext(0.15, 0.75, '$R^{2}$ =' + r_sq_3) #correlation line m, b = np.polyfit(corx3, cory3, 1) plt.plot(corx3, m*corx3+b, linewidth=1, color="skyblue") plt.figtext(0.07, -0.3, 'Fig 4: shows the correlation between \u03B418O and \u03B413C regarding\nkinetic fractionation. Here, the R2 value calculates as 0.43 which \ncan be interpreted as some correlation and therefore possible \nlittle kinetic fractionation. ') |
Text(0.07, -0.3, 'Fig 4: shows the correlation between δ18O and δ13C regarding\nkinetic fractionation. Here, the R2 value calculates as 0.43 which \ncan be interpreted as some correlation and therefore possible \nlittle kinetic fractionation. ')
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | #plot 5 - comparing 3 climate reconstructions f = plt.figure() f.set_figwidth(10) f.set_figheight(6) #plot 1 -treerings ax = plt.subplot(3,1,1) plt.title('Comparison Treerings, \u03B418O and \u03B413C ') # Add a title to the plot plt.plot(treex, treey, linewidth=0.5, color="red") plt.ylabel('Temp. Anomalies') ax.get_xaxis().set_visible(True) ax.spines['top'].set_visible(True) ax.spines['right'].set_visible(False) ax.spines['bottom'].set_visible(False) ax.spines['left'].set_visible(True) ax.xaxis.tick_top() ax.xaxis.set_label_position("top") #rectangles subplot 1 rectangle = plt.Rectangle((1985, -3), 12, 5, fc='teal', alpha = 0.4) plt.gca().add_patch(rectangle) rectangle2 = plt.Rectangle((2260, -3), 60, 5, fc='coral', alpha = 0.4) plt.gca().add_patch(rectangle2) rectangle3 = plt.Rectangle((1680, -3), 20, 5, fc='aqua', alpha = 0.4) plt.gca().add_patch(rectangle3) #plot 2 - d18O ax2 = plt.subplot(3,1,2) plt.plot(d18O.index, d18O['del18O'], linewidth=0.5, color = 'green') plt.ylabel('\u03B418O (‰)') ax2.get_xaxis().set_visible(False) ax2.spines['top'].set_visible(False) ax2.spines['right'].set_visible(True) ax2.spines['bottom'].set_visible(False) ax2.spines['left'].set_visible(False) ax2.yaxis.tick_right() ax2.yaxis.set_label_position("right") #rectangles subplot 2 rectangle = plt.Rectangle((2000, -11), 12, 1.8, fc='teal', alpha = 0.4) plt.gca().add_patch(rectangle) rectangle2 = plt.Rectangle((2190, -11), 70, 1.8, fc='coral', alpha = 0.4) plt.gca().add_patch(rectangle2) rectangle3 = plt.Rectangle((1660, -11), 20, 1.8, fc='aqua', alpha = 0.4) plt.gca().add_patch(rectangle3) #plot 3 - d13C ax3 = plt.subplot(3,1,3) plt.plot(d13C.index, d13C['del13C'], linewidth=0.5, color="m") plt.xlabel('years BP') plt.ylabel('\u03B413C (‰)') ax3.spines['top'].set_visible(False) ax3.spines['right'].set_visible(False) #rectangles subplot 3 rectangle = plt.Rectangle((2000, -4.8), 12, 2.5, fc='teal', alpha = 0.4) plt.gca().add_patch(rectangle) rectangle2 = plt.Rectangle((2195, -4.8), 65, 2.5, fc='coral', alpha = 0.4) plt.gca().add_patch(rectangle2) rectangle3= plt.Rectangle((1658, -4.8), 33, 2.5, fc='aqua', alpha = 0.4) plt.gca().add_patch(rectangle3) plt.figtext(0.07, -0.05, 'Fig 5: shows the comparison between the isotopic data of V4 and the measured temperature Anomalies in Europe (Büntgen et al., 2011) \nAquablue = Rapid Cooling/Drying Event, Teal-blue = Cool Event, Orange = Fluctuations during Celtic Expansion') |
Text(0.07, -0.05, 'Fig 5: shows the comparison between the isotopic data of V4 and the measured temperature Anomalies in Europe (Büntgen et al., 2011) \nAquablue = Rapid Cooling/Drying Event, Teal-blue = Cool Event, Orange = Fluctuations during Celtic Expansion')
Note
These plots just give a brief look into the climate reconstruction from the stalagmite V4. For further information as well as the complete interpretation you can read our Bachelor Thesis where this study originates from.
1 |