Programming project¶
Preamble¶
- Names: Vanessa Riegler and Simon Widy
- Matrikelnummer: 12118027, 12208597
1 2 3 4 | #packages import pandas as pd import numpy as np import matplotlib.pyplot as plt |
01 - An energy balance model with hysteresis¶
Based on the code we wrote in week 07, code the following extension. For this exercise it's OK to copy-paste my solutions and start from there. Only copy the necessary (no useless code)!
The planetary albedo $\alpha$ is in fact changing with climate change. As the temperature drops, sea-ice and ice sheets are extending (increasing the albedo). Inversely, the albedo decreases as temperature rises. The planetary albedo of our simple energy balance model follows the following equation:
$$ \alpha = \begin{cases} 0.3,& \text{if } T \gt 280\\ 0.7,& \text{if } T \lt 250\\ a T + b, & \text{otherwise} \end{cases} $$
01-01: Compute the parameters $a$ and $b$ so that the equation is continuous at T=250K and T=280K:
1 2 3 4 5 6 7 8 | T1 = 250 T2 = 280 alpha1 = 0.7 alpha2 = 0.3 a = (alpha2 - alpha1) / (T2 - T1) b = alpha1 - a * T1 print(f"The value of a is: {a:.5f}") print(f"The value of b is: {b:.5f}") |
The value of a is: -0.01333 The value of b is: 4.03333
01-02: Now write a function called alpha_from_temperature
which accepts a single positional parameter T
as input (a scalar) and returns the corresponding albedo. Test your function using doctests to make sure that it complies to the instructions above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | def alpha_from_temperature(T): """ Compute the planetary albedo based on the temperature. Parameters ---------- T : float Temperature in Kelvin. Returns ------- float Planetary albedo. Examples -------- >>> alpha_from_temperature(288) 0.3 >>> alpha_from_temperature(240) 0.7 >>> alpha_from_temperature(265) 0.5 >>> alpha_from_temperature(280) 0.3 """ if T > 280: return round(0.3, 6) elif T < 250: return round(0.7, 6) else: a = (0.3 - 0.7) / (280 - 250) b = 0.7 - a * 250 return round(a * T + b, 6) import doctest doctest.testmod() |
TestResults(failed=0, attempted=4)
01-03: Adapt the existing code from week 07 to write a function called temperature_change_with_hysteresis
which accepts t0
(the starting temperature in K), n_years
(the number of simulation years) as positional arguments and tau
(the atmosphere transmissivity) as keyword argument (default value 0.611). Verify that:
- the stabilization temperature with
t0 = 292
and default tau is approximately 288K - the stabilization temperature with
t0 = 265
and default tau is approximately 233K
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | def temperature_change_with_hysteresis(t0, n_years, tau=0.611): C = 4.0e+08 dt = 60 * 60 * 24 * 365 def asr(alpha=0.3): s0 = 1362 return (1 - alpha) * s0 / 4 def olr(t, tau=0.611): sigma = 5.67E-8 return sigma * tau * t**4 temperature = np.zeros(n_years + 1) temperature[0] = t0 for i in range(n_years): alpha = alpha_from_temperature(temperature[i]) olr_current = olr(temperature[i], tau=tau) temperature[i + 1] = temperature[i] + dt / C * (asr(alpha=alpha) - olr_current) return temperature t0 = 292 n_years = 100 temperature = temperature_change_with_hysteresis(t0, n_years) stabilization_temperature = temperature[-1] print(f"The stabilization temperature with t0 = {t0} and default tau is approximately {stabilization_temperature:.2f}K") t0 = 265 n_years = 100 temperature = temperature_change_with_hysteresis(t0, n_years) stabilization_temperature = temperature[-1] print(f"The stabilization temperature with t0 = {t0} and default tau is approximately {stabilization_temperature:.2f}K") |
The stabilization temperature with t0 = 292 and default tau is approximately 288.00K The stabilization temperature with t0 = 265 and default tau is approximately 233.03K
01-04: Realize a total of N simulations with starting temperatures regularly spaced between t0
=206K, and t0
=318K and plot them on a single plot for n_years
=50. The plot should look somewhat similar to this example for N=21
.
Bonus: only if you want (and if time permits), you can try to increase N and add colors to your plot to create a graph similar to this one.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | N = 21 t0_values = np.linspace(206, 318, N) n_years = 50 plt.figure(figsize=(10, 6)) for i, t0 in enumerate(t0_values): temperature = temperature_change_with_hysteresis(t0, n_years) years = np.arange(n_years + 1) plt.plot(years, temperature, label=f"t0={t0:.2f}K") plt.xlabel("Years") plt.ylabel("Temperature (K)") plt.title("Temperature Change with Hysteresis") plt.show() |
Bonus: only if you want (and if time permits), you can try to increase N and add colors to your plot to create a graph similar to this one.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | N = 100 t0_values = np.linspace(206, 318, N) n_years = 50 plt.figure(figsize=(10, 6)) color_map = plt.get_cmap("rainbow") for i, t0 in enumerate(t0_values): temperature = temperature_change_with_hysteresis(t0, n_years) years = np.arange(n_years + 1) color = color_map(float(i) / N) plt.plot(years, temperature, label=f"t0={t0:.2f}K", color=color) plt.xlabel("Years") plt.ylabel("Temperature (K)") plt.title("Temperature Change with Hysteresis") plt.show() |
02 - Weather station data files¶
I downloaded 10 min data from the recently launched ZAMG data hub. The data file contains selected parameters from the "INNSBRUCK-FLUGPLATZ (ID: 11804)" weather station.
You can download the data from the following links (right-click + "Save as..."):
- station data
- parameter metadata
- station list from the ZAMG (in a better format than last time)
Let me open the data for you and display its content:
1 2 3 | df = pd.read_csv('INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True) df = df.drop('station', axis=1) print(df) |
DD FF GSX P RF RR SO TB1 TB2 TB3 time 2015-01-01 00:00:00 232.0 1.0 0.0 965.7 96.0 0.0 0.0 3.2 3.6 5.2 \ 2015-01-01 00:10:00 248.0 1.6 0.0 965.6 97.0 0.0 0.0 3.2 3.6 5.2 2015-01-01 00:20:00 252.0 1.6 0.0 965.6 97.0 0.0 0.0 3.2 3.6 5.2 2015-01-01 00:30:00 267.0 1.3 0.0 965.7 97.0 0.0 0.0 3.2 3.6 5.2 2015-01-01 00:40:00 279.0 1.4 0.0 965.7 97.0 0.0 0.0 3.2 3.6 5.2 ... ... ... ... ... ... ... ... ... ... ... 2021-12-31 23:10:00 257.0 1.1 0.0 959.8 98.0 0.0 0.0 3.1 3.4 3.4 2021-12-31 23:20:00 106.0 0.3 0.0 960.0 100.0 0.0 0.0 3.0 3.4 3.4 2021-12-31 23:30:00 141.0 0.5 0.0 960.2 100.0 0.0 0.0 3.0 3.4 3.4 2021-12-31 23:40:00 167.0 0.5 0.0 960.2 100.0 0.0 0.0 3.0 3.4 3.4 2021-12-31 23:50:00 275.0 0.3 0.0 960.1 100.0 0.0 0.0 3.0 3.4 3.4 TL TP time 2015-01-01 00:00:00 -2.7 -3.4 2015-01-01 00:10:00 -2.4 -2.9 2015-01-01 00:20:00 -2.4 -2.9 2015-01-01 00:30:00 -2.7 -3.2 2015-01-01 00:40:00 -2.5 -3.0 ... ... ... 2021-12-31 23:10:00 -1.7 -1.7 2021-12-31 23:20:00 -2.0 -2.0 2021-12-31 23:30:00 -1.7 -1.7 2021-12-31 23:40:00 -1.6 -1.6 2021-12-31 23:50:00 -1.6 -1.7 [368208 rows x 12 columns]
02-01: after reading the documentation of the respective functions (and maybe try a few things yourself), explain in plain sentences:
- what am I asking pandas to do with the
index_col=1, parse_dates=True
keyword arguments? Why am I doing this? - what am I asking pandas to do with
.drop()
? Whyaxis=1
?
index_col=1, parse_dates=True makes the first (actually second, because 0 is also a value) column as index for the data and converts the dates (which would normally be read as strings) into actual datetime objects
.drop('station') deletes the column/row with the index station, axis=1 means that the column is dropped
Now let me do something else from you:
1 2 | dfmeta = pd.read_csv('ZEHNMIN Parameter-Metadaten.csv', index_col=0) dfmeta.loc[df.columns] |
Kurzbeschreibung | Beschreibung | Einheit | |
---|---|---|---|
DD | Windrichtung | Windrichtung, vektorielles Mittel über 10 Minuten | ° |
FF | vektorielle Windgeschwindigkeit | Windgeschwindigkeit, vektorielles Mittel über ... | m/s |
GSX | Globalstrahlung | Globalstrahlung, arithmetisches Mittel über 10... | W/m² |
P | Luftdruck | Luftdruck, Basiswert zur Minute10 | hPa |
RF | Relative Feuchte | Relative Luftfeuchte, Basiswert zur Minute10 | % |
RR | Niederschlag | 10 Minuten Summe des Niederschlags, Summe der ... | mm |
SO | Sonnenscheindauer | Sonnenscheindauer, Sekundensumme über 10 Minuten | s |
TB1 | Erdbodentemperatur in 10cm Tiefe | Erdbodentemperatur in 10cm Tiefe, Basiswert zu... | °C |
TB2 | Erdbodentemperatur in 20cm Tiefe | Erdbodentemperatur in 20cm Tiefe, Basiswert zu... | °C |
TB3 | Erdbodentemperatur in 50cm Tiefe | Erdbodentemperatur in 50cm Tiefe, Basiswert zu... | °C |
TL | Lufttemperatur in 2m | Lufttemperatur in 2m Höhe, Basiswert zur Minute10 | °C |
TP | Taupunkt | Taupunktstemperatur, Basiswert zur Minute10 | °C |
02-02: again, explain in plain sentences what the dfmeta.loc[df.columns]
is doing, and why it works that way.
- ***dfmeta.loc[]** only selects the data from the indices given in [], df.columns gives the column labels from the dataframe df, so the new dataframe dfmeta only has the data which are indices in the dataframe from the station INNSBRUCK_FLUGPLATZ*
Finally, let me do a last step for you before you start coding:
1 2 | dfh = df.resample('H').mean() print(dfh) |
DD FF GSX P RF RR time 2015-01-01 00:00:00 257.333333 1.383333 0.0 965.666667 97.000000 0.0 \ 2015-01-01 01:00:00 160.666667 0.766667 0.0 965.966667 97.666667 0.0 2015-01-01 02:00:00 227.000000 0.816667 0.0 966.316667 98.500000 0.0 2015-01-01 03:00:00 258.833333 1.933333 0.0 966.466667 97.000000 0.0 2015-01-01 04:00:00 241.500000 1.266667 0.0 966.416667 95.166667 0.0 ... ... ... ... ... ... ... 2021-12-31 19:00:00 283.500000 1.166667 0.0 957.766667 98.833333 0.0 2021-12-31 20:00:00 237.333333 0.716667 0.0 958.100000 99.500000 0.0 2021-12-31 21:00:00 149.833333 1.083333 0.0 958.783333 99.666667 0.0 2021-12-31 22:00:00 175.833333 0.516667 0.0 959.433333 99.166667 0.0 2021-12-31 23:00:00 198.833333 0.600000 0.0 960.016667 99.500000 0.0 SO TB1 TB2 TB3 TL TP time 2015-01-01 00:00:00 0.0 3.200000 3.600000 5.2 -2.533333 -3.066667 2015-01-01 01:00:00 0.0 3.150000 3.600000 5.2 -3.250000 -3.716667 2015-01-01 02:00:00 0.0 3.100000 3.600000 5.2 -3.583333 -3.933333 2015-01-01 03:00:00 0.0 3.100000 3.600000 5.2 -3.250000 -3.766667 2015-01-01 04:00:00 0.0 3.100000 3.600000 5.2 -3.116667 -3.883333 ... ... ... ... ... ... ... 2021-12-31 19:00:00 0.0 3.316667 3.300000 3.4 0.900000 0.716667 2021-12-31 20:00:00 0.0 3.283333 3.300000 3.4 -0.200000 -0.350000 2021-12-31 21:00:00 0.0 3.200000 3.300000 3.4 -0.800000 -0.916667 2021-12-31 22:00:00 0.0 3.100000 3.383333 3.4 -1.083333 -1.200000 2021-12-31 23:00:00 0.0 3.033333 3.400000 3.4 -1.683333 -1.716667 [61368 rows x 12 columns]
02-03: explore the dfh
dataframe. Explain, in plain words, what the purpose of .resample('H')
followed by mean()
is. Explain what .resample('H').max()
and .resample('H').sum()
would do.
- dfh.resample('H').mean() converts the 10min data into hourly data and takes the mean value of the data within the hour, ***.max()** would take the max value, .sum() would take the sum of the data in one hour*
02-04: Using np.allclose
, make sure that the average of the first hour (that you'll compute yourself from df
) is indeed equal to the first row of dfh
. Now, two variables in the dataframe have units that aren't suitable for averaging. Please convert the following variables to the correct units:
RR
needs to be converted from the average of 10 min sums to mm/hSO
needs to be converted from the average of 10 min sums to s/h
1 | np.allclose(df.iloc[0:6].mean(), dfh.head(1)) |
True
1 2 3 | dfhsum=df.resample('H').sum() dfh['RR']=dfhsum['RR'] dfh['SO']=dfhsum['SO'] |
From now on, we will use the hourly data only (and further aggregations when necessary). The 10 mins data are great but require a little bit more of pandas kung fu (the chinese term, not the sport) to be used efficiently.
Spend some time exploring the dfh
dataframe we just created. What time period does it cover? What variables does it contain?
1 2 3 4 | print(np.allclose(dfh['RR'].mean(), df['RR'].mean()*6)) period=dfh.index.max()-dfh.index.min() print("Time period:", period) dfmeta.loc[dfh.columns] |
True Time period: 2556 days 23:00:00
Kurzbeschreibung | Beschreibung | Einheit | |
---|---|---|---|
DD | Windrichtung | Windrichtung, vektorielles Mittel über 10 Minuten | ° |
FF | vektorielle Windgeschwindigkeit | Windgeschwindigkeit, vektorielles Mittel über ... | m/s |
GSX | Globalstrahlung | Globalstrahlung, arithmetisches Mittel über 10... | W/m² |
P | Luftdruck | Luftdruck, Basiswert zur Minute10 | hPa |
RF | Relative Feuchte | Relative Luftfeuchte, Basiswert zur Minute10 | % |
RR | Niederschlag | 10 Minuten Summe des Niederschlags, Summe der ... | mm |
SO | Sonnenscheindauer | Sonnenscheindauer, Sekundensumme über 10 Minuten | s |
TB1 | Erdbodentemperatur in 10cm Tiefe | Erdbodentemperatur in 10cm Tiefe, Basiswert zu... | °C |
TB2 | Erdbodentemperatur in 20cm Tiefe | Erdbodentemperatur in 20cm Tiefe, Basiswert zu... | °C |
TB3 | Erdbodentemperatur in 50cm Tiefe | Erdbodentemperatur in 50cm Tiefe, Basiswert zu... | °C |
TL | Lufttemperatur in 2m | Lufttemperatur in 2m Höhe, Basiswert zur Minute10 | °C |
TP | Taupunkt | Taupunktstemperatur, Basiswert zur Minute10 | °C |
Note on pandas: all the exercises below can be done with or without pandas. Each question can be answered with very few lines of code (often one or two) with pandas, and I recommend to use it as much as possible. If you want, you can always use numpy in case of doubt: you can access the data as a numpy array with: df[column_name].values
.
03 - Precipitation¶
In this section, we will focus on precipitation only.
03-01: Compute the average annual precipitation (m/year) over the 7-year period.
1 2 | avg_ann_precip=dfh['RR'].sum()/7/100 print(avg_ann_precip) |
9.20557142857143
03-02: What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?
1 2 | print(dfh['RR'][dfh['RR']>0].min(), dfh['RR'][dfh['RR']>0].idxmin()) print(dfh['RR'].max(), dfh['RR'].idxmax()) |
0.1 2015-01-02 16:00:00 22.2 2021-09-16 22:00:00
03-03: Plot a histogram of hourly precipitation, with bins of size 0.2 mm/h, starting at 0.1 mm/h and ending at 25 mm/h. Plot the same data, but this time with a logarithmic y-axis. Compute the 99th percentile (or quantile) of hourly precipitation.
1 2 3 4 | prechist=dfh['RR'][dfh['RR']>0].plot.hist(bins=125, xlabel='Precipitation in mm/h', ylabel='Frequency') prechist.set_xlim(0, 25) plt.title('histogram of hourly precipitation') plt.show() |
1 2 3 4 | prechistlog=dfh['RR'][dfh['RR']>0].plot.hist(xlabel='Precipitation in mm/h', ylabel='Frequency', bins=125, logy=True) prechistlog.set_xlim(0, 25) plt.title('histogram of hourly precipitation (logarythmic y-axis)') plt.show() |
1 2 | p=np.percentile(dfh['RR'], 99) print(p) |
2.3330000000001747
03-04: Compute daily sums (mm/d) of precipitation (tip: use .resample
again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).
1 2 | dfd = dfh.resample('D').sum() print(len(dfd['RR'][dfd['RR']>0])/7) |
170.28571428571428
03-05: Now select (subset) the daily dataframe to keep only only daily data in the months of December, January, February (DFJ). To do this, note that dfh.index.month
exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).
1 2 3 4 5 6 7 8 | #DFJ dfddjf=dfd[(dfd.index.month<3)|(dfd.index.month>11)] print('Average precipitation in DJF (mm / d):', dfddjf['RR'].mean()) print('Average number of rainy days in DJF:', len(dfddjf['RR'][dfddjf['RR']>0])/7) #JJA dfdjja=dfd[(dfd.index.month<9)&(dfd.index.month>5)] print('Average precipitation in JJA (mm / d):', dfdjja['RR'].mean()) print('Average number of rainy days in JJA:', len(dfdjja['RR'][dfdjja['RR']>0])/7) |
Average precipitation in DJF (mm / d): 1.7726265822784812 Average number of rainy days in DJF: 37.57142857142857 Average precipitation in JJA (mm / d): 4.024844720496894 Average number of rainy days in JJA: 53.142857142857146
03-06: Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.
1 2 3 4 5 6 7 | #DJF dfhdjf=dfh[(dfh.index.month<3)|(dfh.index.month>11)] print('DJF hourly precipitation over 99th Percentile:', len(dfhdjf['RR'][dfhdjf['RR']>p])) #JJA dfhjja=dfh[(dfh.index.month<9)&(dfh.index.month>5)] print('JJA hourly precipitation over 99th Percentile:', len(dfhjja['RR'][dfhjja['RR']>p])) |
DJF hourly precipitation over 99th Percentile: 68 JJA hourly precipitation over 99th Percentile: 308
03-07: Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA. I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour
exists and can be used to categorize data. Then, note that df.groupby
exists and can be used exactly for that (documentation).
1 2 3 4 5 | dfhjja['RR'].groupby(dfhjja['RR'].index.hour).mean().plot(xlabel='Hour of day (UTC)', ylabel='hourly precipitation (mm/h)', label='JJA') dfhdjf['RR'].groupby(dfhdjf['RR'].index.hour).mean().plot(xlabel='Hour of day (UTC)', ylabel='hourly precipitation (mm/h)', label='DJF') plt.title('average daily cycle of hourly precipitation') plt.legend() plt.show() |
04 - A few other variables¶
In this section, we will continue to analyze the weather station data.
04-01: Verify that the three soil temperatures have approximately the same average value over the entire period. Now plot the three soil temperature timeseries in hourly resolution over the course of the year of 2020 (example). Repeat the plot with the month of may 2020.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #select data start_date = '2015-01-01' end_date = '2015-12-31' df_year = dfh[start_date:end_date] plt.figure(figsize=(12, 6)) plt.plot(df_year.index, df_year['TB1'], label='TB1') plt.plot(df_year.index, df_year['TB2'], label='TB2') plt.plot(df_year.index, df_year['TB3'], label='TB3') plt.xlabel('Time') plt.ylabel('Soil Temperature (°C)') plt.title('Soil Temperature over the Course of One Year') plt.legend() plt.show() |
1 2 3 4 5 6 7 8 9 10 11 | df_may = dfh['2020-05-01':'2020-05-31'] plt.figure(figsize=(12, 6)) plt.plot(df_may.index, df_may['TB1'], label='TB1') plt.plot(df_may.index, df_may['TB2'], label='TB2') plt.plot(df_may.index, df_may['TB3'], label='TB3') plt.xlabel('Time') plt.ylabel('Soil Temperature (°C)') plt.title('Soil Temperature for May 2020') plt.legend() plt.show() |
04-02: Plot the average daily cycle of all three soil temperatures.
1 2 3 4 5 6 7 8 9 10 11 12 13 | hours = dfh.index.hour mean_temperatures = dfh.groupby(hours).mean() plt.figure(figsize=(10, 6)) plt.plot(mean_temperatures.index, mean_temperatures['TB1'], label='TB1') plt.plot(mean_temperatures.index, mean_temperatures['TB2'], label='TB2') plt.plot(mean_temperatures.index, mean_temperatures['TB3'], label='TB3') plt.xlabel('Hour of the Day') plt.ylabel('Average Soil Temperature (°C)') plt.title('Average Daily Cycle of Soil Temperatures') plt.legend() plt.show() |
04-03: Compute the difference (in °C) between the air temperature and the dewpoint temperature. Now plot this difference on a scatter plot (x-axis: relative humidity, y-axis: temperature difference).
1 2 3 4 5 6 7 8 | temp_difference = dfh['TL'] - dfh['TP'] plt.figure(figsize=(10, 6)) plt.scatter(dfh['RF'], temp_difference, s=5, alpha=0.5) plt.xlabel('Relative Humidity') plt.ylabel('Temperature Difference (°C)') plt.title('Difference between Air Temperature and Dewpoint Temperature') plt.show() |
05 - Free coding project¶
The last part of this semester project is up to you! You are free to explore whatever interests you. I however add three requirements:
- This section should have at least 5 original plots in it. They are the output of your analysis.
- This section should also use additional data that you downloaded yourself. The easiest way would probably be to download another station(s) from the ZAMG database, or data from the same station but for another time period (e.g. for trend or change analysis). You can, however, decide to do something completely different if you prefer (as long as you download and read one more file).
- this section should contain at least one regression or correlation analysis between two parameters. Examples:
- between two different variables at the same station (like we did with the dewpoint above)
- between different stations (for example, average temperature as a function of station elevation)
- between average temperature and time (trends analysis)
- etc.
That's it! Here are a few ideas:
- detection of trends and changes at the station Innsbruck for 1993-2021
- comparison of 5-yr climatologies at various stations in Tirol, taking elevation or location into account
- compute the theoretical day length from the station's longitude and latitude (you can find solutions for this online, just let me know the source if you used a solution online), and use these computations to compare the measured sunshine duration to the maximum day length. This can be used to classify "sunshine days" for example.
- use the python "windrose" library to plot a windrose at different locations and time of day.
- etc.
If you have your own idea but are unsure about whether this is too much or not enough, come to see me in class! In general, the three requirements above should be enough.
My goal with this section is to let you formulate a programming goal and implement it.
Project goal:¶
compare the quality of reanalysis models with station measurement data (temperature, precipitation, wind) in an alpine valley (Innsbruck)
timespan: 15 years (2008-01-01 to 2022-12-31)
models (data source: meteoblue):
- ERA5, grid space: 30km, grid location innsbruck: 47.5°N 11.25°E, 1151m asl.
- NEMS4, grid space: 4km, grid location innsbruck: 47.246°N 11.40°E, 823m asl.
station (data source: ZAMG data hub):
- Innsbruck-Flugplatz 47.26°N, 11.36°E
fun fact: we got the idea to compare the models with station data because we won a meteoblue history+ subscription (which provides the data) at the stumeta pupquiz and wanted to make use of it :)
data you need: packages you need:
- INNSBRUCK-FLUGPLATZ_STD_20080101T0000_20221231T2300.csv
- meteoblue_STD_ERA5_Innsbruck_2008-2022.csv
- meteoblue_STD_NEMS4_Innsbruck_2008-2022.csv
- numpy
- pandas
- matplotlib
- windrose
read and adjust the data¶
1 2 3 4 | dfzamg = pd.read_csv('INNSBRUCK-FLUGPLATZ_STD_20080101T0000_20221231T2300.csv', index_col=0, parse_dates=True) dfzamg = dfzamg.drop('station', axis=1) dfera = pd.read_csv('meteoblue_STD_ERA5_Innsbruck_2008-2022.csv', skiprows=9, index_col=0, parse_dates=True) dfnems = pd.read_csv('meteoblue_STD_NEMS4_Innsbruck_2008-2022.csv', skiprows=9, index_col=0, parse_dates=True) |
1 2 3 4 5 | try: dfzamg.index=dfzamg.index.tz_convert(None) print('Time has been converted') except: print('Time is already converted!') |
Time has been converted
temperature analysis¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | dferamean=dfera.resample('D').mean() dfnemsmean=dfnems.resample('D').mean() dfzamgmean=dfzamg.resample('D').mean() fig, (tempera, tempnems, tempzamg) = plt.subplots(1, 3, figsize=(14, 6)) dfzamgmean['TTX'].plot(ax=tempzamg, label='measured temperature') dferamean['Innsbruck Temperature [2 m elevation corrected]'].plot(ax=tempera, label='temperature ERA5', color='orange') dfnemsmean['Innsbruck Temperature [2 m elevation corrected]'].plot(ax=tempnems, label='temperature NEMS4', color='green') tempera.set_ylabel('temperature (°C)') tempera.set_title('ERA5') tempera.set_ylim(dfzamg['TTX'].min(), dfzamg['TTX'].max()) tempnems.set_title('NEMS4') tempnems.set_ylim(dfzamg['TTX'].min(), dfzamg['TTX'].max()) tempzamg.set_title('station data') tempzamg.set_ylim(dfzamg['TTX'].min(), dfzamg['TTX'].max()) fig.suptitle('daily mean temperature Innsbruck 2008-2022') plt.show() |
1 2 3 4 5 6 | dfzamgmean['TTX'][-365:].plot(label='measured temperature') dferamean['Innsbruck Temperature [2 m elevation corrected]'][-365:].plot(label='temperature ERA5', color='orange') dfnemsmean['Innsbruck Temperature [2 m elevation corrected]'][-365:].plot(label='temperature NEMS4', color='green') plt.title('Daily mean temperature 2022 all combined (°C)') plt.legend() plt.show() |
conclusion:
The deviation of both of the models is quite small for the daily mean temperature!
It seems like the reanalysis-model data for mean temperature is quite good, no matter what the grid space is.
precipitation analysis¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | fig, (precera, precnems, preczamg) = plt.subplots(1, 3, figsize=(14, 6)) dfzamg['RSX'][dfzamg['RSX']>0].plot.hist(bins=125, ax=preczamg, label='measured precipitation') dfera['Innsbruck Precipitation Total'][dfera['Innsbruck Precipitation Total']>0].plot.hist(bins=125, ax=precera, label='precipitation ERA5', color='orange') dfnems['Innsbruck Precipitation Total'][dfnems['Innsbruck Precipitation Total']>0].plot.hist(bins=125, ax=precnems, label='precipitation NEMS4', color='green') precnems.set_xlabel('precipitation(mm/h)') precera.set_title('ERA5') precera.set_xlim(0, 10) precera.set_ylim(0, 8000) precnems.set_title('NEMS4') precnems.set_xlim(0, 10) precnems.set_ylim(0, 8000) preczamg.set_title('station data') preczamg.set_xlim(0, 10) preczamg.set_ylim(0, 8000) fig.suptitle('histogramm of precipitation') plt.show() norainera=len(dfera['Innsbruck Precipitation Total'][dfera['Innsbruck Precipitation Total']==0]) norainnems=len(dfnems['Innsbruck Precipitation Total'][dfnems['Innsbruck Precipitation Total']==0]) norainzamg=len(dfzamg['RSX'][dfzamg['RSX']==0]) print('Days without rain: ERA5: ' + str(norainera) + ', NEMS4: ' + str(norainnems) + ', measured: ' + str(norainzamg)) |
Days without rain: ERA5: 92805, NEMS4: 119447, measured: 115896
1 2 3 4 5 6 | dfzamg['RSX'][-184*24:-154*24].plot(label='measured precipitation', ylabel='precipitation(mm/h)') dfera['Innsbruck Precipitation Total'][-184*24:-154*24].plot(label='precipitation ERA5') dfnems['Innsbruck Precipitation Total'][-184*24:-154*24].plot(label='precipitation NEMS4') plt.title('precipitation all combined July 2022') plt.legend() plt.show() |
1 2 3 4 5 6 7 8 9 | dfdiffera=dfzamg['RSX']-dfera['Innsbruck Precipitation Total'] dfdiffnems=dfzamg['RSX']-dfnems['Innsbruck Precipitation Total'] fig, (differa, diffnems) = plt.subplots(1, 2, figsize=(10, 5)) dfdiffera.plot.box(ax=differa) dfdiffnems.plot.box(ax=diffnems) differa.set_title('ERA5') diffnems.set_title('NEMS4') fig.suptitle('boxplot of model/station difference in precipitation (mm/h)') plt.show() |
conclusion:
In the plots above you can see, that both models are quite ok for precipitation, NEMS4 is closer with no precipitation days, but the amount of precipitation is a bit low compared to the station data.
A weakness though is that extreme (short-term) rain events like on July 22nd 2022 can't be covered by the models.
wind analysis¶
mamba install windrose
1 2 3 4 5 6 7 8 9 10 11 12 13 | from windrose import WindroseAxes fig, (wrmb, wrmbnems, wrzamg) = plt.subplots(1, 3, figsize=(14, 5), subplot_kw=dict(projection='windrose')) wrmb.bar(dfera['Innsbruck Wind Direction [10 m]'], dfera['Innsbruck Wind Speed [10 m]']/3.6, normed=True, opening=0.8, edgecolor='white') wrmb.set_title('model data ERA5') wrmb.set_legend() wrmbnems.bar(dfnems['Innsbruck Wind Direction [10 m]'], dfnems['Innsbruck Wind Speed [10 m]']/3.6, normed=True, opening=0.8, edgecolor='white') wrmbnems.set_title('model data NEMS4') wrmbnems.set_legend() wrzamg.bar(dfzamg['D6X'], dfzamg['VVX'], normed=True, opening=0.8, edgecolor='white') wrzamg.set_title('station data') wrzamg.set_legend() fig.suptitle('windrose comparison') plt.show() |
conclusion:
The wind is the meteorological parameter which is affectetd the most by the grid size of the models:
- The first windrose (ERA5) is completely wrong for wind direction as well as wind speed (max speed at 5.6m/s compared to 18.8m/s)!
- The second windrose (NEMS4) is closer to the measured data and therefore a way better analysis, but still far from perfect!
Overall outcome¶
As expected (and also learned in VO Introduction to Atmospheric sciences), models with a big grid space don't work very well in alpine terrain. We assume, that the ERA5 model with 30km grid size doesn't even recognize the Inntal as a valley. In contrast, the data that the NEMS4 model provides, is already quite useful, even though not perfectly correct.