Programming project¶
1 2 3 4 5 | import numpy as np import matplotlib.pyplot as plt from matplotlib import cm from scipy.interpolate import interp1d import pandas as pd |
Instructions¶
The programming tasks described below have to be done in groups of two. It's totally okay to ask for help or advice to others (also to me!), but copy-pasting from your classmates is strictly forbidden. You will see that some tasks require internet searches that go beyond what you've learned in my class notes. This is expected: my goal is for you to be able to solve new problems more or less independently.
Project due date: Sunday 11.06.2023, 23:59 (the return folders on OLAT will close at precisely this time - no exceptions!)
Please return two files:
- the full notebook as
.ipynb
, executed with all outputs (I recommend to execute "Restart kernel and run all cells" one last time before submission) - any additional file you may have downloaded for your project (as zip or tar file containing several files if too large).
Please return only one project per group (one of the two has to upload the files).
Grading (30% of the final VU grade) will be based on the following criteria:
- correctness of the code (the notebook should theoretically run on my laptop as well, assuming that the data files are in the folder)
- correctness of the solutions (correct plot, correct values, etc.)
- for the plots: correct use of title, axis labels, units, etc.
- clarity of the code (no unused lines of code, presence of inline comments, etc.)
- clarity and layout of the overall document (correct use of the Jupyter Notebook markdown format)
- originality (for the open project)
You can download this notebook to get started. Delete this "Instructions" section, add your name and Matrikelnummer and start coding! For the tasks that ask for a written answer (e.g. some values, answers, etc.), please use the notebook's markdown format to write them down!
Preamble¶
- Names: Christian Mauntz, Daniel Panzanini
- Matrikelnummer: 1208373, 12003894
01 - An energy balance model with hysteresis¶
Based on the code we wrote in week 07, code the following extension. For this exercise it's OK to copy-paste my solutions and start from there. Only copy the necessary (no useless code)!
The planetary albedo $\alpha$ is in fact changing with climate change. As the temperature drops, sea-ice and ice sheets are extending (increasing the albedo). Inversely, the albedo decreases as temperature rises. The planetary albedo of our simple energy balance model follows the following equation:
$$ \alpha = \begin{cases} 0.3,& \text{if } T \gt 280\\ 0.7,& \text{if } T \lt 250\\ a T + b, & \text{otherwise} \end{cases} $$
01-01: Compute the parameters $a$ and $b$ so that the equation is continuous at T=250K and T=280K.
01-02: Now write a function called alpha_from_temperature
which accepts a single positional parameter T
as input (a scalar) and returns the corresponding albedo. Test your function using doctests to make sure that it complies to the instructions above.
01-03: Adapt the existing code from week 07 to write a function called temperature_change_with_hysteresis
which accepts t0
(the starting temperature in K), n_years
(the number of simulation years) as positional arguments and tau
(the atmosphere transmissivity) as keyword argument (default value 0.611). Verify that:
- the stabilization temperature with
t0 = 292
and default tau is approximately 288K - the stabilization temperature with
t0 = 265
and default tau is approximately 233K
01-04: Realize a total of N simulations with starting temperatures regularly spaced between t0
=206K, and t0
=318K and plot them on a single plot for n_years
=50. The plot should look somewhat similar to this example for N=21
.
Bonus: only if you want (and if time permits), you can try to increase N and add colors to your plot to create a graph similar to this one.
01-01: Compute the parameters $a$ and $b$ so that the equation is continuous at T=250K and T=280K.
1 2 3 4 5 6 7 | def linear_regression(x, y): x = np.asarray(x) y = np.asarray(y) b = (np.size(x)*sum(x*y)-sum(x)*sum(y))/(np.size(x)*sum(x**2)-(sum(x)**2)) a = np.mean(y)-b*np.mean(x) return a, b |
1 2 3 | a,b = linear_regression([250, 280], [0.7, 0.3]) print(a) print(b) |
4.033333333333333 -0.013333333333333334
01-02: Now write a function called alpha_from_temperature
which accepts a single positional parameter T
as input (a scalar) and returns the corresponding albedo. Test your function using doctests to make sure that it complies to the instructions above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | def alpha_from_temperature(T): """accepts a single positional parameter T and returns the corresponding albedo Parameters ----------- T: float Temperature (K) Returns ------- alpha: float Examples -------- >>> alpha_from_temperature(281) 0.3 >>> alpha_from_temperature(260) 0.5666666666666664 >>> alpha_from_temperature(292) 0.3 """ if T > 280: alpha = 0.3 elif T < 250: alpha = 0.7 else: x = [250, 280] y = [0.7, 0.3] a, b = linear_regression(x,y) alpha = a + b*T return alpha # Testing import doctest doctest.testmod() |
TestResults(failed=0, attempted=3)
01-03: Adapt the existing code from week 07 to write a function called temperature_change_with_hysteresis
which accepts t0
(the starting temperature in K), n_years
(the number of simulation years) as positional arguments and tau
(the atmosphere transmissivity) as keyword argument (default value 0.611). Verify that:
- the stabilization temperature with
t0 = 292
and default tau is approximately 288K - the stabilization temperature with
t0 = 265
and default tau is approximately 233K
1 2 | C = 4.0e+08 dt = 60 * 60 * 24 * 365 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | def olr(t, tau =0.611): sigma = 5.67E-8 olr = sigma * tau * t**4 return olr def asr(t, alpha=0.3): alpha = alpha_from_temperature(t) s0 = 1362 asr = (1-alpha) * s0 / 4 return asr def temperature_change_with_hysteresis(t, n_years, tau =0.611): t1 = t years = np.arange(n_years + 1) t = np.zeros(n_years + 1) t[0] = t1 for i in range(n_years): t[i + 1] = t[i] + dt / C * (asr(t[i])- olr(t[i], tau=tau)) return years, t |
1 2 | print(temperature_change_with_hysteresis(292, 100, tau=0.611)) print('we can see the temp converges against 288K for the 100 Years') |
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]), array([292. , 290.93503273, 290.15816267, 289.58946707, 289.17210602, 288.86524238, 288.63931706, 288.47281713, 288.35002266, 288.25941271, 288.19252534, 288.14313541, 288.10665784, 288.0797126 , 288.05980638, 288.04509907, 288.03423218, 288.02620251, 288.02026909, 288.01588456, 288.01264452, 288.01025019, 288.00848081, 288.00717325, 288.00620696, 288.00549287, 288.00496516, 288.00457518, 288.00428698, 288.004074 , 288.00391661, 288.00380029, 288.00371433, 288.00365081, 288.00360386, 288.00356917, 288.00354353, 288.00352458, 288.00351058, 288.00350023, 288.00349259, 288.00348694, 288.00348276, 288.00347967, 288.00347739, 288.00347571, 288.00347446, 288.00347354, 288.00347286, 288.00347236, 288.00347199, 288.00347171, 288.00347151, 288.00347136, 288.00347125, 288.00347117, 288.00347111, 288.00347106, 288.00347103, 288.003471 , 288.00347099, 288.00347097, 288.00347096, 288.00347095, 288.00347095, 288.00347095, 288.00347094, 288.00347094, 288.00347094, 288.00347094, 288.00347094, 288.00347094, 288.00347094, 288.00347094, 288.00347094, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093, 288.00347093])) we can see the temp converges against 288K for the 100 Years
1 2 | print(temperature_change_with_hysteresis(265, 100, tau=0.611)) print('we can see the temp converges against 233K for the 100 Years') |
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]), array([265. , 264.95291988, 264.89855772, 264.83578084, 264.76327829, 264.67953203, 264.58278336, 264.47099349, 264.34179726, 264.19244871, 264.01975681, 263.82000934, 263.58888255, 263.32133333, 263.01147009, 262.65239742, 262.2360281 , 261.75285442, 261.19166823, 260.53921583, 259.77976944, 258.8945909 , 257.86125481, 256.65278639, 255.23655306, 253.57282452, 251.61288239, 249.29651024, 246.80042382, 244.72054608, 242.97796545, 241.51143854, 240.27267703, 239.22310148, 238.33154556, 237.57259474, 236.92535958, 236.37255374, 235.89978988, 235.49503452, 235.14818044, 234.85070776, 234.59541237, 234.37618657, 234.18784041, 234.02595515, 233.88676232, 233.76704335, 233.66404597, 233.57541406, 233.49912881, 233.43345905, 233.37691925, 233.32823391, 233.28630734, 233.25019784, 233.21909583, 233.19230501, 233.16922643, 233.14934466, 233.13221613, 233.11745901, 233.10474456, 233.09378972, 233.08435074, 233.0762177 , 233.06920978, 233.06317125, 233.05796793, 233.05348425, 233.04962065, 233.04629135, 233.04342243, 233.04095021, 233.03881983, 233.03698402, 233.03540203, 233.03403877, 233.03286399, 233.03185164, 233.03097924, 233.03022746, 233.02957961, 233.02902133, 233.02854023, 233.02812564, 233.02776837, 233.02746049, 233.02719517, 233.02696653, 233.0267695 , 233.02659971, 233.02645339, 233.0263273 , 233.02621864, 233.026125 , 233.0260443 , 233.02597477, 233.02591484, 233.0258632 , 233.0258187 ])) we can see the temp converges against 233K for the 100 Years
01-04: Realize a total of N simulations with starting temperatures regularly spaced between t0=206K, and t0=318K and plot them on a single plot for n_years=50. The plot should look somewhat similar to this example for N=21.
Bonus: only if you want (and if time permits), you can try to increase N and add colors to your plot to create a graph similar to this one.
1 2 3 4 5 6 7 8 9 10 | n = 21 t = np.linspace(206, 318, n) n_years = 50 for i in range(n): plt.subplot() a, x =temperature_change_with_hysteresis(t[i], n_years) plt.plot(a, x, 'k') plt.ylabel('Temperatur (K)') plt.xlabel('Years') plt.title('Climate change scenario with hysteresis') |
1 2 3 4 5 6 7 8 9 10 11 | n = 1000 t = np.linspace(206, 318, n) n_years = 50 for i in range(n): plt.subplot() a, x =temperature_change_with_hysteresis(t[i], n_years) plt.plot(a, x, 'k') plt.plot(a, x, color=cm.jet(i/1000)) plt.ylabel('Temperatur (K)') plt.xlabel('Years') plt.title('Climate change scenario with hysteresis') |
02 - Weather station data files¶
I downloaded 10 min data from the recently launched ZAMG data hub. The data file contains selected parameters from the "INNSBRUCK-FLUGPLATZ (ID: 11804)" weather station.
You can download the data from the following links (right-click + "Save as..."):
- station data
- parameter metadata
- station list from the ZAMG (in a better format than last time)
Let me open the data for you and display its content:
1 2 3 | df = pd.read_csv('INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True) df = df.drop('station', axis=1) df |
DD | FF | GSX | P | RF | RR | SO | TB1 | TB2 | TB3 | TL | TP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
time | ||||||||||||
2015-01-01 00:00:00 | 232.0 | 1.0 | 0.0 | 965.7 | 96.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.7 | -3.4 |
2015-01-01 00:10:00 | 248.0 | 1.6 | 0.0 | 965.6 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.4 | -2.9 |
2015-01-01 00:20:00 | 252.0 | 1.6 | 0.0 | 965.6 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.4 | -2.9 |
2015-01-01 00:30:00 | 267.0 | 1.3 | 0.0 | 965.7 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.7 | -3.2 |
2015-01-01 00:40:00 | 279.0 | 1.4 | 0.0 | 965.7 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.5 | -3.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-12-31 23:10:00 | 257.0 | 1.1 | 0.0 | 959.8 | 98.0 | 0.0 | 0.0 | 3.1 | 3.4 | 3.4 | -1.7 | -1.7 |
2021-12-31 23:20:00 | 106.0 | 0.3 | 0.0 | 960.0 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -2.0 | -2.0 |
2021-12-31 23:30:00 | 141.0 | 0.5 | 0.0 | 960.2 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -1.7 | -1.7 |
2021-12-31 23:40:00 | 167.0 | 0.5 | 0.0 | 960.2 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -1.6 | -1.6 |
2021-12-31 23:50:00 | 275.0 | 0.3 | 0.0 | 960.1 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -1.6 | -1.7 |
368208 rows × 12 columns
02-01: after reading the documentation of the respective functions (and maybe try a few things yourself), explain in plain sentences:
- what am I asking pandas to do with the
index_col=1, parse_dates=True
keyword arguments? Why am I doing this? - what am I asking pandas to do with
.drop()
? Whyaxis=1
?
index_col=1
with that you tell python to set the index column to the first column. By default the index column is not set.
parse_dates=True
with that command you try to a dissect the columns each as a own date column
.drop()
with that command you you keep the Index station from appearing in the dataframe.
axis=1
the axis 1 is the index column of the station thath you want to drop.
Now let me do something else from you:
1 2 | dfmeta = pd.read_csv('ZEHNMIN Parameter-Metadaten.csv', index_col=0) dfmeta.loc[df.columns] |
Kurzbeschreibung | Beschreibung | Einheit | |
---|---|---|---|
DD | Windrichtung | Windrichtung, vektorielles Mittel über 10 Minuten | ° |
FF | vektorielle Windgeschwindigkeit | Windgeschwindigkeit, vektorielles Mittel über ... | m/s |
GSX | Globalstrahlung | Globalstrahlung, arithmetisches Mittel über 10... | W/m² |
P | Luftdruck | Luftdruck, Basiswert zur Minute10 | hPa |
RF | Relative Feuchte | Relative Luftfeuchte, Basiswert zur Minute10 | % |
RR | Niederschlag | 10 Minuten Summe des Niederschlags, Summe der ... | mm |
SO | Sonnenscheindauer | Sonnenscheindauer, Sekundensumme über 10 Minuten | s |
TB1 | Erdbodentemperatur in 10cm Tiefe | Erdbodentemperatur in 10cm Tiefe, Basiswert zu... | °C |
TB2 | Erdbodentemperatur in 20cm Tiefe | Erdbodentemperatur in 20cm Tiefe, Basiswert zu... | °C |
TB3 | Erdbodentemperatur in 50cm Tiefe | Erdbodentemperatur in 50cm Tiefe, Basiswert zu... | °C |
TL | Lufttemperatur in 2m | Lufttemperatur in 2m Höhe, Basiswert zur Minute10 | °C |
TP | Taupunkt | Taupunktstemperatur, Basiswert zur Minute10 | °C |
02-02: again, explain in plain sentences what the dfmeta.loc[df.columns]
is doing, and why it works that way.
The command dfmeta.loc[df.columns]
retrieves specific rows from the 'ZEHNMIN Parameter-Metadaten.csv' document. It selects rows in dfmeta that have matching index labels as the columns in the dataframe df. The .loc
function searches for the provided indexes within the brackets, which, in this case, are df.columns representing the indexes of the df dataframe. By applying these indexes to the dfmeta dataframe using dfmeta.loc[df.columns]
, the resulting output is the dfmeta dataframe with only the selected indexes displayed.
Finally, let me do a last step for you before you start coding:
1 | dfh = df.resample('H').mean() |
02-03: explore the dfh
dataframe. Explain, in plain words, what the purpose of .resample('H')
followed by mean()
is. Explain what .resample('H').max()
and .resample('H').sum()
would do.
With .resample('H')
you tell panda to put together the datafram in hourly values, the mean()
makes panda calculate the avergae of the 6 10 min values and put it in the new dataframe. .resample('H').max()
makes the same for resample but only keeps the highest value of the 10 min values. .resample('H').sum()
makes the same for resample but add all 6 10 min values together and keeps them.
02-04: Using np.allclose
, make sure that the average of the first hour (that you'll compute yourself from df
) is indeed equal to the first row of dfh
. Now, two variables in the dataframe have units that aren't suitable for averaging. Please convert the following variables to the correct units:
RR
needs to be converted from the average of 10 min sums to mm/hSO
needs to be converted from the average of 10 min sums to s/h
1 2 3 | dfhour = df.iloc[0:6].mean() avgdfhour = dfh.iloc[0:1] np.allclose(dfhour,avgdfhour) |
True
RR
Converging to mm/h
1 | dfh["RR"] = df["RR"].resample('H').sum() |
SO
Convergin to s/h
1 | dfh["SO"] = df["SO"].resample('H').sum() |
From now on, we will use the hourly data only (and further aggregations when necessary). The 10 mins data are great but require a little bit more of pandas kung fu (the chinese term, not the sport) to be used efficiently.
Note on pandas: all the exercises below can be done with or without pandas. Each question can be answered with very few lines of code (often one or two) with pandas, and I recommend to use it as much as possible. If you want, you can always use numpy in case of doubt: you can access the data as a numpy array with: df[column_name].values
.
Spend some time exploring the dfh
dataframe we just created. What time period does it cover? What variables does it contain?
Time period is from 2015-01-01 00:00:00 to 2021-12-31 23:00:00.
The Variables are DD, FF, GSX, P, RF, RR, SO, TB1, TB2, TB3, TL, TP
03 - Precipitation¶
In this section, we will focus on precipitation only.
03-01: Compute the average annual precipitation (m/year) over the 7-year period.
03-02: What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?
03-03: Plot a histogram of hourly precipitation, with bins of size 0.2 mm/h, starting at 0.1 mm/h and ending at 25 mm/h. Plot the same data, but this time with a logarithmic y-axis. Compute the 99th percentile (or quantile) of hourly precipitation.
03-04: Compute daily sums (mm/d) of precipitation (tip: use .resample
again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).
03-05: Now select (subset) the daily dataframe to keep only only daily data in the months of December, January, February (DFJ). To do this, note that dfh.index.month
exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).
03-06: Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.
03-07: Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA. I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour
exists and can be used to categorize data. Then, note that df.groupby
exists and can be used exactly for that (documentation).
03-01: Compute the average annual precipitation (m/year) over the 7-year period.
1 2 3 | rain_year = (dfh["RR"]*24*365).resample('Y').mean() rain_year_avg = np.average(rain_year)/1000 print(f'The annual perciptaion over the 7 year period is {rain_year_avg} m/year.') |
The annual perciptaion over the 7 year period is 0.919841881342701 m/year.
03-02: What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?
1 2 3 4 5 | rainover0 = dfh.RR[dfh["RR"]>0] minRR = rainover0.min() print(f'the smallest non-zero precipitation at the station is {minRR} mm/hr.') dfmin = pd.DataFrame(rainover0[rainover0 == minRR].index.values) print(dfmin) |
the smallest non-zero precipitation at the station is 0.1 mm/hr. 0 0 2015-01-02 16:00:00 1 2015-01-11 10:00:00 2 2015-01-14 16:00:00 3 2015-01-18 04:00:00 4 2015-01-24 04:00:00 ... ... 1648 2021-12-13 05:00:00 1649 2021-12-13 06:00:00 1650 2021-12-13 07:00:00 1651 2021-12-30 09:00:00 1652 2021-12-30 11:00:00 [1653 rows x 1 columns]
1 2 3 4 | maxRR = rainover0.max() print(f'the maximum precipitation at the station is {maxRR} mm/hr.') dfmax = pd.DataFrame(rainover0[rainover0 == maxRR].index.values) print(dfmax) |
the maximum precipitation at the station is 22.2 mm/hr. 0 0 2021-09-16 22:00:00
03-03: Plot a histogram of hourly precipitation, with bins of size 0.2 mm/h, starting at 0.1 mm/h and ending at 25 mm/h. Plot the same data, but this time with a logarithmic y-axis. Compute the 99th percentile (or quantile) of hourly precipitation.
1 2 3 4 5 6 7 8 9 10 11 | binsize = ((25-0.1)/0.2) dfhRR01 = dfh["RR"][dfh["RR"]>= 0.1] ax = dfhRR01.plot.hist(bins=round(binsize), label="hourly precipitation") plt.title('hourly precipitation per amount of precipitation') plt.xlabel('precipitation [mm/hr]') plt.ylabel('frequency') plt.legend(loc='upper right') plt.show(); |
1 2 3 4 5 6 | ax = dfhRR01.plot.hist(bins=round(binsize)) plt.yscale("log") plt.title('hourly precipitation per amount of precipitation') plt.xlabel('precipitation in mm/hr') plt.ylabel('frequency') plt.show() |
1 2 | per99 = np.percentile(dfh["RR"], 99) print(f'the 99 percentile is {per99} mm/hr') |
the 99 percentile is 2.3330000000001747 mm/hr
03-04: Compute daily sums (mm/d) of precipitation (tip: use .resample
again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).
1 2 | rainday = df["RR"].resample('D').sum() print(rainday) |
time 2015-01-01 0.0 2015-01-02 0.6 2015-01-03 12.2 2015-01-04 7.7 2015-01-05 0.0 ... 2021-12-27 0.0 2021-12-28 0.0 2021-12-29 15.1 2021-12-30 2.1 2021-12-31 0.0 Freq: D, Name: RR, Length: 2557, dtype: float64
1 2 3 | realrainday = rainday[rainday>=0.1] avgrainday = round(len(realrainday)/7) print(f' The average number of rain days in IBK is {avgrainday} per Year.') |
The average number of rain days in IBK is 170 per Year.
03-05: Now select (subset) the daily dataframe to keep only only daily data in the months of December, January, February (DFJ). To do this, note that dfh.index.month
exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).
1 2 3 | dfd = df.resample('D').mean() dfd["RR"] = df["RR"].resample('D').sum() dfd["SO"] = df["RR"].resample('D').sum() |
1 2 | DJF = dfd[dfd.index.month.isin([12,1,2])] print(f' the average percipitation is {(round(DJF.RR.mean(),2))} mm/d in DJF') |
the average percipitation is 1.77 mm/d in DJF
1 | print(f' the average number of rainy days in DJF is {round(len(DJF.RR[DJF.RR >= 0.1])/7)} Days') |
the average number of rainy days in DJF is 38 Days
1 2 | JJA = dfd[dfd.index.month.isin([6,7,8])] print(f' the average percipitation is {(round(JJA.RR.mean(),2))} mm/d in JJA') |
the average percipitation is 4.02 mm/d in JJA
1 | print(f' the average number of rainy days in JJA is {round(len(JJA.RR[JJA.RR >= 0.1])/7)} Days') |
the average number of rainy days in JJA is 53 Days
03-06: Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.
1 2 3 | DJFhour = dfh[dfh.index.month.isin([12,1,2])] DJFhourabove99 = DJFhour.RR[DJFhour.RR > per99] print(f' the hourly percipitation in DJF was {len(DJFhourabove99)} times over the 99th Percentile') |
the hourly percipitation in DJF was 68 times over the 99th Percentile
1 2 3 | JJAhour = dfh[dfh.index.month.isin([6,7,8])] JJAhourabove99 = JJAhour.RR[JJAhour.RR > per99] print(f' the hourly percipitation in JJA was {len(JJAhourabove99)} times over the 99th Percentile') |
the hourly percipitation in JJA was 308 times over the 99th Percentile
03-07: Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA. I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour
exists and can be used to categorize data. Then, note that df.groupby
exists and can be used exactly for that (documentation).
1 2 3 4 5 6 7 8 9 10 11 12 | plt.subplot() DJFmean= DJFhour.RR.groupby(DJFhour.index.hour).mean() JJAmean = JJAhour.RR.groupby(JJAhour.index.hour).mean() plt.plot(DJFmean, label ='DJF') plt.plot(JJAmean, label = 'JJA') plt.xlabel('Hour of day (UTC)') plt.ylabel('hourly precipiation (mm/hr)') plt.legend(["DJF", "JJA"]) plt.show() |
04 - A few other variables¶
In this section, we will continue to analyze the weather station data.
04-01: Verify that the three soil temperatures have approximately the same average value over the entire period. Now plot the three soil temperature timeseries in hourly resolution over the course of the year of 2020 (example). Repeat the plot with the month of may 2020.
04-02: Plot the average daily cycle of all three soil temperatures.
04-03: Compute the difference (in °C) between the air temperature and the dewpoint temperature. Now plot this difference on a scatter plot (x-axis: relative humidity, y-axis: temperature difference).
04-01: Verify that the three soil temperatures have approximately the same average value over the entire period. Now plot the three soil temperature timeseries in hourly resolution over the course of the year of 2020 (example). Repeat the plot with the month of may 2020.
1 2 3 4 | df.TB1.mean() df.TB2.mean() df.TB3.mean() np.allclose(df.TB1.mean(), df.TB2.mean(), df.TB3.mean()) |
True
1 2 3 4 5 6 7 8 | plt.subplot(title = 'Soil temperature') dfh.TB1[dfh.TB1.index.year == 2020].plot() dfh.TB2[dfh.TB2.index.year == 2020].plot() dfh.TB3[dfh.TB3.index.year == 2020].plot() plt.legend(["TB1", "TB2", "TB3"]) plt.ylabel('°C') plt.xlabel('time') plt.show() |
1 2 3 4 5 6 7 8 | plt.subplot(title = 'Soil temperature') dfh.TB1[(dfh.TB1.index.year == 2020)&(dfh.TB1.index.month == 5)].plot() dfh.TB2[(dfh.TB2.index.year == 2020)&(dfh.TB2.index.month == 5)].plot() dfh.TB3[(dfh.TB3.index.year == 2020)&(dfh.TB3.index.month == 5)].plot() plt.legend(["TB1", "TB2", "TB3"]) plt.ylabel('°C') plt.xlabel('time') plt.show() |
04-02: Plot the average daily cycle of all three soil temperatures.
1 2 3 | dfhtb1 = dfh.TB1.groupby(dfh.index.hour).mean() dfhtb2 = dfh.TB2.groupby(dfh.index.hour).mean() dfhtb3 = dfh.TB3.groupby(dfh.index.hour).mean() |
1 2 3 4 5 6 7 8 9 | plt.subplot() plt.plot(dfhtb1, label = 'TB1') plt.plot(dfhtb2, label = 'TB2') plt.plot(dfhtb3, label = 'TB3') plt.xlabel('time') plt.ylabel('°C') plt.title('dailycycle of soil temperature') plt.legend() plt.show() |
04-03: Compute the difference (in °C) between the air temperature and the dewpoint temperature. Now plot this difference on a scatter plot (x-axis: relative humidity, y-axis: temperature difference).
1 2 3 4 5 | diftltp = df["TL"]-df["TP"] plt.scatter(df["RF"],diftltp) plt.xlabel('relative humiditiy (%)') plt.ylabel('Difference between air and dewpoint temperature') plt.show() |
05 - Free coding project¶
The last part of this semester project is up to you! You are free to explore whatever interests you. I however add three requirements:
- This section should have at least 5 original plots in it. They are the output of your analysis.
- This section should also use additional data that you downloaded yourself. The easiest way would probably be to download another station(s) from the ZAMG database, or data from the same station but for another time period (e.g. for trend or change analysis). You can, however, decide to do something completely different if you prefer (as long as you download and read one more file).
- this section should contain at least one regression or correlation analysis between two parameters. Examples:
- between two different variables at the same station (like we did with the dewpoint above)
- between different stations (for example, average temperature as a function of station elevation)
- between average temperature and time (trends analysis)
- etc.
That's it! Here are a few ideas:
- detection of trends and changes at the station Innsbruck for 1993-2021
- comparison of 5-yr climatologies at various stations in Tirol, taking elevation or location into account
- compute the theoretical day length from the station's longitude and latitude (you can find solutions for this online, just let me know the source if you used a solution online), and use these computations to compare the measured sunshine duration to the maximum day length. This can be used to classify "sunshine days" for example.
- use the python "windrose" library to plot a windrose at different locations and time of day.
- etc.
If you have your own idea but are unsure about whether this is too much or not enough, come to see me in class! In general, the three requirements above should be enough.
My goal with this section is to let you formulate a programming goal and implement it.
1 2 | dorf = pd.read_csv('STD Datensatz dorf_20150101T0000_20221231T2300.csv',index_col=0, parse_dates=True) dorf = dorf.drop('station', axis=1) |
1 2 | gletscher = pd.read_csv('STD Datensatz gletscher_20150101T0000_20221231T2300.csv',index_col=0, parse_dates=True) gletscher = gletscher.drop('station', axis=1) |
1 2 3 4 5 | Dorf = dorf.resample('M').mean() Gletscher = gletscher.resample('M').mean() t_dorf = Dorf['TTX'] n_dorf = Dorf['RSX'] t_gletscher = Gletscher['TTX'] |
Calculation of Temperature difference between St. Leonhard/Pitztal and Pitztaler Gletscher
1 2 3 4 5 6 7 8 | plt.subplot() plt.plot(t_dorf,'-o', label='St. Leonhard') plt.plot(t_gletscher,'-o', label='Pitztaler Gletscher') plt.xlabel('monthly values') plt.ylabel('temperature in °C') plt.title('temperature St. Leonhard and Pitztaler Gletscher') plt.legend() plt.show() |
1 | diff_t = t_gletscher-t_dorf |
1 2 3 4 5 | plt.plot(diff_t, '+') plt.xlabel('monthly values') plt.ylabel('temperature difference in °C') plt.title('Values for Temperature difference for each Month') plt.show() |
Calculation of percipitation difference between St. Leonhard and Pitztaler Gletscher
1 2 3 4 | Dorf = dorf.resample('M').sum() Gletscher = gletscher.resample('M').sum() n_dorf = Dorf['RSX'] n_gletscher = Gletscher['RSX'] |
1 2 3 4 5 6 7 8 | plt.subplot() plt.plot(n_dorf,'-o', label='St. Leonhard') plt.plot(n_gletscher,'-o', label='Pitztaler Gletscher') plt.xlabel('monthly values') plt.ylabel('precipitation in mm') plt.title('Monthly precipitation of St. Leonhard and Pitztaler Gletscher') plt.legend() plt.show() |
1 | diff_n = n_gletscher-n_dorf |
1 2 3 4 5 | plt.plot(diff_n, '+') plt.xlabel('monthly values') plt.ylabel('temperature difference in mm') plt.title('Values for precipitation difference for each Month') plt.show() |
Calculation of linear regression between different values
1 2 3 4 5 6 7 8 | plt.scatter(t_dorf, n_dorf) z = np.polyfit(t_dorf, n_dorf, 1) p = np.poly1d(z) plt.plot(t_dorf, p(t_dorf), 'r') plt.xlabel('temperature in °C') plt.ylabel('percipitation in mm') plt.title('linear regression between temperature and percipitation in St. Leonhard') plt.show() |
1 2 3 4 5 6 7 8 | plt.scatter(t_gletscher, n_gletscher) z = np.polyfit(t_gletscher, n_gletscher, 1) p = np.poly1d(z) plt.plot(t_gletscher, p(t_gletscher), 'r') plt.xlabel('temperature in °C') plt.ylabel('percipitation in mm') plt.title('linear regression between temperature and percipitation in Pitztaler Gletscher') plt.show() |
1 2 3 4 5 6 7 8 | plt.scatter(t_dorf, t_gletscher) z = np.polyfit(t_dorf, t_gletscher, 1) p = np.poly1d(z) plt.plot(t_dorf, p(t_dorf), 'r') plt.xlabel('temperature in °C St. Leonhard') plt.ylabel('temperature in °C Pitztaler Gletscher') plt.title('linear regression between temperature St. Leonhard and Pitztaler Gletscher') plt.show() |
1 2 3 4 5 6 7 8 | plt.scatter(n_dorf, n_gletscher) z = np.polyfit(n_dorf, n_gletscher, 1) p = np.poly1d(z) plt.plot(n_dorf, p(n_dorf), 'r') plt.xlabel('precipitation in mm St. Leonhard') plt.ylabel('precipitation in mm Pitztaler Gletscher') plt.title('linear regression between precipitation St. Leonhard and Pitztaler Gletscher') plt.show() |