Programming project¶
Preamble¶
- Names: Marian Ebenbichler, Kilian Trummer
- Matrikelnummer: 12221149, 12117412
01 - An energy balance model with hysteresis¶
Based on the code we wrote in week 07, code the following extension. For this exercise it's OK to copy-paste my solutions and start from there. Only copy the necessary (no useless code)!
The planetary albedo $\alpha$ is in fact changing with climate change. As the temperature drops, sea-ice and ice sheets are extending (increasing the albedo). Inversely, the albedo decreases as temperature rises. The planetary albedo of our simple energy balance model follows the following equation:
$$ \alpha = \begin{cases} 0.3,& \text{if } T \gt 280\\ 0.7,& \text{if } T \lt 250\\ a T + b, & \text{otherwise} \end{cases} $$
01-01: Compute the parameters $a$ and $b$ so that the equation is continuous at T=250K and T=280K.
1 | import numpy as np |
1 2 | a = np.array([[250, 1], [280,1]]) a |
array([[250, 1], [280, 1]])
1 2 3 4 5 | a_inv = np.linalg.inv(a) # Check that the inverse worked np.testing.assert_allclose(a @ a_inv, np.identity(2), atol=1e-7) np.round(a @ a_inv, decimals=7) |
array([[1., 0.], [0., 1.]])
1 2 3 | solution = a_inv @ [0.7, 0.3] a,b = solution print(a,b) |
-0.013333333333333338 4.033333333333335
01-02: Now write a function called alpha_from_temperature
which accepts a single positional parameter T
as input (a scalar) and returns the corresponding albedo. Test your function using doctests to make sure that it complies to the instructions above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | def alpha_from_temperature(T): """ Compute the planetary albedo alpha based on temperature T. Parameters ---------- T : float Temperature value in Kelvin. Returns ------- a * T + b : Planetary albedo alpha(float) Examples: >>> alpha_from_temperature(290) 0.3 >>> alpha_from_temperature(240) 0.7 Planetary albedo equation: alpha = | 0.3, if T > 280 | 0.7, if T < 250 | a * T + b, otherwise Compute the parameters a and b so that the equation is continuous at T=250K and T=280K. >>> a = (0.7 - 0.3) / (250 - 280) >>> b = 0.7 - a * 250 >>> alpha_from_temperature(249) 0.7 >>> alpha_from_temperature(281) 0.3 >>> abs(alpha_from_temperature(265) - 0.5) <= 0.00001 True >>> abs(alpha_from_temperature(275) - 0.366666) <= 0.00001 True """ if T > 280: return 0.3 elif T < 250: return 0.7 else: #a = (0.7 - 0.3) / (250 - 280) #b = 0.7 - a * 250 return a * T + b |
01-03: Adapt the existing code from week 07 to write a function called temperature_change_with_hysteresis
which accepts t0
(the starting temperature in K), n_years
(the number of simulation years) as positional arguments and tau
(the atmosphere transmissivity) as keyword argument (default value 0.611). Verify that:
- the stabilization temperature with
t0 = 292
and default tau is approximately 288K - the stabilization temperature with
t0 = 265
and default tau is approximately 233K
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | def asr(alpha=0.3): """Absorbed Solar Radiation (W m-2). Parameters ---------- alpha : float, optional the planetary albedo Returns ------- the asr (float) Examples -------- >>> print(f'{asr():.2f}') 238.35 >>> print(f'{asr(alpha=0.32):.2f}') 231.54 """ S = 1362 return (1 - alpha) * S / 4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | def olr(t, tau=0.611): """Outgoing Longwave Radiation (W m-2). Parameters ---------- t : float the atmosphere temperature (K) tau : float, optional the atmosphere transmissivity (-) Returns ------- the olr (float) Examples -------- >>> print(f'{olr(288):.2f}') 238.34 >>> print(f'{olr(288, tau=0.63):.2f}') 245.75 """ sbc=5.67e-8 olr=tau*sbc*t**4 return olr |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | def temperature_change_with_hysteresis(t0, n_years, tau=0.611): """ Simulate temperature change with hysteresis effect over a given number of years. Parameters ---------- t0 : float Starting temperature in Kelvin. n_years : int Number of simulation years. tau : float, optional Atmosphere transmissivity. Default is 0.611. Returns: ---------- float: Stabilization temperature after the given number of years. Examples: >>> round(temperature_change_with_hysteresis(292, 100)[1],1) 288.0 >>> round(temperature_change_with_hysteresis(265, 100)[1],1) 233.0 """ dt = 60 * 60 * 24 * 365 C = 4.0e+08 time = np.arange(n_years + 1) T = np.zeros(n_years + 1) T[0] = t0 C = 4.0e+08 #computing the new albedo and new temperature for every year for i in range(n_years): alpha = alpha_from_temperature(T[i]) T[i+1] = T[i] + (dt / C) * (asr(alpha=alpha) - olr(T[i], tau=tau)) return time, T |
01-04: Realize a total of N simulations with starting temperatures regularly spaced between t0
=206K, and t0
=318K and plot them on a single plot for n_years
=50. The plot should look somewhat similar to this example for N=21
.
Bonus: only if you want (and if time permits), you can try to increase N and add colors to your plot to create a graph similar to this one.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | import matplotlib.pyplot as plt N = 21 # Number of simulations n_years = 50 # Number of simulation years # Generate regularly spaced starting temperatures t0_values = np.linspace(206, 318, N) # Generate time array t = np.linspace(0, n_years, n_years+1) for t0 in t0_values: plt.plot(t, temperature_change_with_hysteresis(t0, n_years, tau=0.6)[1],color='black', label=str(t0) + 'K') plt.xlabel('Time (years)') plt.ylabel('Temperature (K)') plt.title('Temperature Evolution with Hysteresis') plt.grid(True) plt.show() |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | import matplotlib.pyplot as plt import numpy as np N = 100 # number of simulations n_years = 50 # number of simulated years # Generate regularly spaced starting temperatures t0_values = np.linspace(206, 318, N) # Generate time array t = np.linspace(0, n_years, n_years+1) # Create colorfade from blue to red color_map = plt.cm.get_cmap('Spectral_r') colors = color_map(np.linspace(0, 1, N)) # plot the simulation with color fade for i, t0 in enumerate(t0_values): temperature = temperature_change_with_hysteresis(t0, n_years, tau=0.6) plt.plot(t, temperature[1], color=colors[i], label=str(t0) + 'K') plt.xlabel('Time (years)') plt.ylabel('Temperature (K)') plt.title('Temperature Evolution with Hysteresis') plt.grid(True) plt.show() |
C:\Users\PC\AppData\Local\Temp\ipykernel_14568\3534863455.py:14: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead. color_map = plt.cm.get_cmap('Spectral_r')
02 - Weather station data files¶
I downloaded 10 min data from the recently launched ZAMG data hub. The data file contains selected parameters from the "INNSBRUCK-FLUGPLATZ (ID: 11804)" weather station.
You can download the data from the following links (right-click + "Save as..."):
- station data
- parameter metadata
- station list from the ZAMG (in a better format than last time)
Let me open the data for you and display its content:
1 2 3 | import pandas as pd import numpy as np import matplotlib.pyplot as plt |
1 2 | df = pd.read_csv('INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True) df = df.drop('station', axis=1) |
02-01: after reading the documentation of the respective functions (and maybe try a few things yourself), explain in plain sentences:
- what am I asking pandas to do with the
index_col=1, parse_dates=True
keyword arguments? Why am I doing this? - what am I asking pandas to do with
.drop()
? Whyaxis=1
?
What am I asking pandas to do with the index_col=1, parse_dates=True
keyword arguments? Why am I doing this?
index_col=1
This line of code sets the second column of the csv file as the index column. which is useful in this case because the first column with index 0 is just the stationnumber and is the same for every row. However the second column with index 1 is the time column and it's very useful to use this for the index of the dataframe.
parse_dates=True
This line of code tells pandas that it should attempt to parse the dates in the CSV file and represent them as datetime objects in the resulting Dataframe.
.drop()
is the command to drop/delete a specific column or row, axis=1 indicates that a column should be removed. axis=0 tells pandas to remove a row.
Now let me do something else from you:
1 2 | dfmeta = pd.read_csv('ZEHNMIN Parameter-Metadaten.csv', index_col=0) dfmeta.loc[df.columns] |
Kurzbeschreibung | Beschreibung | Einheit | |
---|---|---|---|
DD | Windrichtung | Windrichtung, vektorielles Mittel über 10 Minuten | ° |
FF | vektorielle Windgeschwindigkeit | Windgeschwindigkeit, vektorielles Mittel über ... | m/s |
GSX | Globalstrahlung | Globalstrahlung, arithmetisches Mittel über 10... | W/m² |
P | Luftdruck | Luftdruck, Basiswert zur Minute10 | hPa |
RF | Relative Feuchte | Relative Luftfeuchte, Basiswert zur Minute10 | % |
RR | Niederschlag | 10 Minuten Summe des Niederschlags, Summe der ... | mm |
SO | Sonnenscheindauer | Sonnenscheindauer, Sekundensumme über 10 Minuten | s |
TB1 | Erdbodentemperatur in 10cm Tiefe | Erdbodentemperatur in 10cm Tiefe, Basiswert zu... | °C |
TB2 | Erdbodentemperatur in 20cm Tiefe | Erdbodentemperatur in 20cm Tiefe, Basiswert zu... | °C |
TB3 | Erdbodentemperatur in 50cm Tiefe | Erdbodentemperatur in 50cm Tiefe, Basiswert zu... | °C |
TL | Lufttemperatur in 2m | Lufttemperatur in 2m Höhe, Basiswert zur Minute10 | °C |
TP | Taupunkt | Taupunktstemperatur, Basiswert zur Minute10 | °C |
02-02: again, explain in plain sentences what the dfmeta.loc[df.columns]
is doing, and why it works that way.
The code dfmeta.loc[df.columns]
retrieves rows from the DataFrame dfmeta
based on the index values provided by the columns of the DataFrame df
.
Finally, let me do a last step for you before you start coding:
1 | dfh = df.resample('H').mean() |
02-03: explore the dfh
dataframe. Explain, in plain words, what the purpose of .resample('H')
followed by mean()
is. Explain what .resample('H').max()
and .resample('H').sum()
would do.
.resample('H')
"Resampling" means that the frequency of the data is changed, in this case, from a 10-minute frequency to an hourly frequency. By using .resample('H'), the data gets grouped into hourly time intervals..resample('H').mean()
The mean function then calculates the mean/average value of each hourly interval seperately..resample('H).max()
The max function calculates the maximum value of each hour seperatley. Meaning that it finds the greatest value recorded within every hourly interval..resample('H'.sum()
Similarly the sum function calculates the sum within each hour, giving the total value recorded in every hour. Meaning that it adds all values of each hourly interval.
02-04: Using np.allclose
, make sure that the average of the first hour (that you'll compute yourself from df
) is indeed equal to the first row of dfh
. Now, two variables in the dataframe have units that aren't suitable for averaging. Please convert the following variables to the correct units:
RR
needs to be converted from the average of 10 min sums to mm/hSO
needs to be converted from the average of 10 min sums to s/h
1 2 3 4 | dfhsum = df.resample('H').sum() dfh['RR'] = dfhsum['RR'] dfh['SO'] = dfhsum['SO'] |
1 | np.allclose(df.iloc[:6].mean(), dfh.head(1)) |
True
From now on, we will use the hourly data only (and further aggregations when necessary). The 10 mins data are great but require a little bit more of pandas kung fu (the chinese term, not the sport) to be used efficiently.
Spend some time exploring the dfh
dataframe we just created. What time period does it cover? What variables does it contain?
1 2 3 4 5 6 | time_range_min = dfh.index.min() time_range_max = dfh.index.max() print(time_range_min, time_range_max) column_names = ', '.join(dfh.columns) print("Variables: ", column_names) |
2015-01-01 00:00:00 2021-12-31 23:00:00 Variables: DD, FF, GSX, P, RF, RR, SO, TB1, TB2, TB3, TL, TP
Note on pandas: all the exercises below can be done with or without pandas. Each question can be answered with very few lines of code (often one or two) with pandas, and I recommend to use it as much as possible. If you want, you can always use numpy in case of doubt: you can access the data as a numpy array with: df[column_name].values
.
03 - Precipitation¶
In this section, we will focus on precipitation only.
03-01: Compute the average annual precipitation (m/year) over the 7-year period.
1 2 3 4 5 6 | annual_precipitation = dfh['RR'].resample('A').sum() # Compute the average annual precipitation average_precipitation = annual_precipitation.mean()/100 print("Average annual precipitation: {:.2f} m/year".format(average_precipitation)) |
Average annual precipitation: 9.21 m/year
03-02: What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # compute the smallest non-zero precipitation smallest_nonzero_precipitation = dfh['RR'].loc[dfh['RR'] > 0].min() print("Smallest non-zero precipitation: {:.2f} mm".format(smallest_nonzero_precipitation)) # resample the data to hourly intervals and compute the maximum hourly precipitation hourly_precipitation = dfh['RR'].resample('H').sum() max_hourly_precipitation = hourly_precipitation.max() print("Maximum hourly precipitation: {:.2f} mm".format(max_hourly_precipitation)) # compute when the maximum horly precipitation did occur max_hourly_precipitation_time = hourly_precipitation.idxmax() print("Time of maximum hourly precipitation: {}".format(max_hourly_precipitation_time)) |
Smallest non-zero precipitation: 0.10 mm Maximum hourly precipitation: 22.20 mm Time of maximum hourly precipitation: 2021-09-16 22:00:00
03-03: Plot a histogram of hourly precipitation, with bins of size 0.2 mm/h, starting at 0.1 mm/h and ending at 25 mm/h. Plot the same data, but this time with a logarithmic y-axis. Compute the 99th percentile (or quantile) of hourly precipitation.
1 2 3 4 5 6 | bins = np.arange(0.1, 25.1, 0.2) # Start at 0.1, end at 25 (inclusive), with bin size of 0.2 plt.hist(hourly_precipitation, bins=bins) plt.xlabel('hourly precipitation (mm/h)') plt.ylabel('frequency in hours') plt.title('histogram of hourly precipitation') plt.show() |
1 2 3 4 5 6 7 | # Plotting histogram with logarithmic y-axis plt.hist(hourly_precipitation, bins=125) plt.xlabel('hourly precipitation (mm/h)') plt.ylabel('frequency (logarithmic scale)') plt.yscale('log') plt.title('histogram of hourly precipitation with logarithmic y-axis') plt.show() |
1 2 3 | # compute the 99th percentile of hourly precipitation percentile_99th = np.percentile(dfh['RR'], 99) print("the 99th percentile of hourly precipitation is:", round(percentile_99th,3), "mm/h") |
the 99th percentile of hourly precipitation is: 2.333 mm/h
03-04: Compute daily sums (mm/d) of precipitation (tip: use .resample
again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).
1 2 3 4 5 6 | daily_precipitation = dfh['RR'].resample('D').sum() # first compute the total number of rain days per year total_raindays = (daily_precipitation >= 0.1).groupby(daily_precipitation.index.year).sum() print("the average number of rain days per year is aprox.:", round(total_raindays.mean()), "days") |
the average number of rain days per year is aprox.: 170 days
03-05: Now select (subset) the daily dataframe to keep only only daily data in the months of December, January, February (DFJ). To do this, note that dfh.index.month
exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).
1 2 3 4 5 6 7 8 9 10 11 | # DJF DJF_daily = daily_precipitation[(daily_precipitation.index.month >= 12) | (daily_precipitation.index.month <=2)] print("the average precipitation in DJF:", DJF_daily.mean(), "mm/d") DJF_raindays = (DJF_daily >= 0.1).sum()/7 print("the average number of rain days in DJF:", DJF_raindays, "days") #JJA JJA_daily = daily_precipitation[(daily_precipitation.index.month >= 6) & (daily_precipitation.index.month <=8)] print("the average precipitation in JJA:", JJA_daily.mean(), "mm/d") JJA_raindays = (JJA_daily >= 0.1).sum()/7 print("the average number of rain days in JJA:", JJA_raindays, "days") |
the average precipitation in DJF: 1.7726265822784812 mm/d the average number of rain days in DJF: 37.57142857142857 days the average precipitation in JJA: 4.024844720496894 mm/d the average number of rain days in JJA: 53.142857142857146 days
03-06: Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.
1 2 3 4 5 6 7 8 | #DJF DJF_hourly = hourly_precipitation[(hourly_precipitation.index.month >= 12) | (hourly_precipitation.index.month <=2)] DJF_over99 = (DJF_hourly > percentile_99th).sum() print("total number of times that hourly precipitation in DJF is above the 99th percentile:", DJF_over99) #JJA JJA_hourly = hourly_precipitation[(hourly_precipitation.index.month >= 6) & (hourly_precipitation.index.month <=8)] JJA_over99 = (JJA_hourly > percentile_99th).sum() print("total number of times that hourly precipitation in JJA is above the 99th percentile:", JJA_over99) |
total number of times that hourly precipitation in DJF is above the 99th percentile: 68 total number of times that hourly precipitation in JJA is above the 99th percentile: 308
03-07: Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA. I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour
exists and can be used to categorize data. Then, note that df.groupby
exists and can be used exactly for that (documentation).
1 2 3 4 5 6 7 8 9 10 11 12 13 | # DJF average cycle of hourly precipitation avg_cycle_DJF = DJF_hourly.groupby(DJF_hourly.index.hour).mean() # JJA average cycle of hourly precipitation avg_cycle_JJA = JJA_hourly.groupby(JJA_hourly.index.hour).mean() # plotting fig, ax = plt.subplots(figsize=(8, 5)) ax.plot(avg_cycle_DJF.index, avg_cycle_DJF, label='DJF') ax.plot(avg_cycle_JJA.index, avg_cycle_JJA, label='JJA') plt.xlabel('Hour of day (UTC)') plt.ylabel('Hourly Precipitation (mm/h)') plt.title('Precipitation daily cycle Innsbruck 2015-2021') plt.legend() plt.show() |
04 - A few other variables¶
In this section, we will continue to analyze the weather station data.
04-01: Verify that the three soil temperatures have approximately the same average value over the entire period. Now plot the three soil temperature timeseries in hourly resolution over the course of the year of 2020 (example). Repeat the plot with the month of may 2020.
1 2 3 4 | TB1_mean = dfh['TB1'].mean() TB2_mean = dfh['TB2'].mean() TB3_mean = dfh['TB3'].mean() print(TB1_mean, TB2_mean, TB3_mean) |
11.393769492976766 11.514406669059351 11.379132772780602
1 2 3 4 5 6 7 8 9 10 11 12 13 | # Filter data for the year 2020 df_2020 = dfh.loc['2020'] # Plot the three soil temperature time series for the year 2020 plt.figure(figsize=(12, 6)) plt.plot(df_2020.index, df_2020['TB1'], label='TB1') plt.plot(df_2020.index, df_2020['TB2'], label='TB2') plt.plot(df_2020.index, df_2020['TB3'], label='TB3') plt.xlabel('Time') plt.ylabel('Soil Temperature (°C)') plt.title('Soil Temperature Time Series - Year 2020') plt.legend() plt.grid(True) plt.show() |
1 2 3 4 5 6 7 8 9 10 11 12 13 | df_may_2020 = df.loc['2020-05'] # Plot the three soil temperature time series for the month of May 2020 plt.figure(figsize=(12, 6)) plt.plot(df_may_2020.index, df_may_2020['TB1'], label='TB1') plt.plot(df_may_2020.index, df_may_2020['TB2'], label='TB2') plt.plot(df_may_2020.index, df_may_2020['TB3'], label='TB3') plt.xlabel('Time') plt.ylabel('Soil Temperature (°C)') plt.title('Soil Temperature Time Series - May 2020') plt.legend() plt.grid(True) plt.show() |
04-02: Plot the average daily cycle of all three soil temperatures.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | df_resampled = df.resample('H').mean() df_resampled['hour'] = df_resampled.index.hour # Calculate the average soil temperature for each hour of the day average_temperatures = df_resampled.groupby('hour')[['TB1', 'TB2', 'TB3']].mean() # Plot the average daily cycle of all three soil temperatures plt.figure(figsize=(12, 6)) plt.plot(average_temperatures.index, average_temperatures['TB1'], label='TB1') plt.plot(average_temperatures.index, average_temperatures['TB2'], label='TB2') plt.plot(average_temperatures.index, average_temperatures['TB3'], label='TB3') plt.xlabel('Hour of the Day') plt.ylabel('Average Soil Temperature (°C)') plt.title('Average Daily Cycle of Soil Temperatures') plt.legend() plt.grid(True) plt.show() |
04-03: Compute the difference (in °C) between the air temperature and the dewpoint temperature. Now plot this difference on a scatter plot (x-axis: relative humidity, y-axis: temperature difference).
1 2 3 4 5 6 7 8 9 10 11 | # Compute the temperature difference between air temperature and dewpoint temperature dfh['TempDiff'] = dfh['TL'] - dfh['TP'] # Plot the scatter plot plt.figure(figsize=(10, 6)) plt.scatter(dfh['RF'], dfh['TempDiff'], s=10) plt.xlabel('Relative Humidity') plt.ylabel('Temperature Difference (°C)') plt.title('Temperature Difference vs. Relative Humidity') plt.grid(True) plt.show() |
05 - Free coding project¶
The last part of this semester project is up to you! You are free to explore whatever interests you. I however add three requirements:
- This section should have at least 5 original plots in it. They are the output of your analysis.
- This section should also use additional data that you downloaded yourself. The easiest way would probably be to download another station(s) from the ZAMG database, or data from the same station but for another time period (e.g. for trend or change analysis). You can, however, decide to do something completely different if you prefer (as long as you download and read one more file).
- this section should contain at least one regression or correlation analysis between two parameters. Examples:
- between two different variables at the same station (like we did with the dewpoint above)
- between different stations (for example, average temperature as a function of station elevation)
- between average temperature and time (trends analysis)
- etc.
That's it! Here are a few ideas:
- detection of trends and changes at the station Innsbruck for 1993-2021
- comparison of 5-yr climatologies at various stations in Tirol, taking elevation or location into account
- compute the theoretical day length from the station's longitude and latitude (you can find solutions for this online, just let me know the source if you used a solution online), and use these computations to compare the measured sunshine duration to the maximum day length. This can be used to classify "sunshine days" for example.
- use the python "windrose" library to plot a windrose at different locations and time of day.
- etc.
If you have your own idea but are unsure about whether this is too much or not enough, come to see me in class! In general, the three requirements above should be enough.
My goal with this section is to let you formulate a programming goal and implement it.
For the fifth part of our coding project, we chose to compare the data from Innsbruck with data from a station at a much higher elevation. Our aim is to see if we can observe climate change at each station and determine if there is a difference in warming between a city and the mountains. To achieve this, we will conduct a trend analysis and also examine any changes in precipitation over time.
As our second station, we have selected Hoher Sonnblick, which is one of Austria's highest and oldest observatories.
1 2 3 4 5 6 | #generating Dataframe for Sonnblick df_SB = pd.read_csv('TAG Datensatz_SB.csv', skiprows=[1], index_col=0, parse_dates=True) df_SB.index = pd.to_datetime(df_SB.index).date df_SB = df_SB.iloc[:, 1:-1] df_SB = df_SB.drop(df_SB.index[1]) df_SB.index = pd.to_datetime(df_SB.index) |
1 | df_SB.head() |
nied | rel | t | tmax | tmin | schnee | vvmax | |
---|---|---|---|---|---|---|---|
1900-01-02 | 0.4 | 95.0 | -5.4 | -4.0 | -6.7 | NaN | NaN |
1900-01-04 | 10.4 | 96.0 | -5.0 | -4.2 | -5.8 | NaN | NaN |
1900-01-05 | 1.4 | 94.0 | -6.8 | -4.5 | -9.0 | NaN | NaN |
1900-01-06 | 12.1 | 89.0 | -8.8 | -7.5 | -10.0 | NaN | NaN |
1900-01-07 | 5.8 | 90.0 | -7.8 | -6.8 | -8.7 | NaN | NaN |
1 2 3 4 5 6 | #generating Dataframe for Innsbruck df_Ibk = pd.read_csv('TAG Datensatz_IBK.csv', skiprows=[1], index_col=0, parse_dates=True) df_Ibk.index = pd.to_datetime(df_Ibk.index).date df_Ibk = df_Ibk.iloc[:, 1:-1] df_Ibk = df_Ibk.drop(df_Ibk.index[1]) df_Ibk.index = pd.to_datetime(df_Ibk.index) |
1 | df_Ibk.head() |
nied | rel | t | tmax | tmin | schnee | vvmax | |
---|---|---|---|---|---|---|---|
1900-01-02 | -1.0 | 84.0 | 7.4 | 11.5 | 3.3 | NaN | NaN |
1900-01-04 | 7.7 | 91.0 | 8.1 | 12.7 | 3.4 | NaN | NaN |
1900-01-05 | -1.0 | 91.0 | 3.8 | 5.0 | 2.5 | NaN | NaN |
1900-01-06 | -1.0 | 89.0 | 2.1 | 3.3 | 0.8 | NaN | NaN |
1900-01-07 | 2.8 | 89.0 | 1.7 | 3.6 | -0.2 | NaN | NaN |
1 | df_Ibk.min() |
nied -1.0 rel 24.0 t -21.1 tmax -18.2 tmin -26.9 schnee -1.0 vvmax 0.5 dtype: float64
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | #removing the first 20 years of the dataframes since there are many incorrect values and outliers which could effect the data analysis later on #and 100 years should me more than enough to visualize changes in climate. # Extract month and year for both dataframes df_Ibk['Month'] = df_Ibk.index.month df_Ibk['Year'] = df_Ibk.index.year df_SB['Month'] = df_SB.index.month df_SB['Year'] = df_SB.index.year # Filter out rows with years between 1900 and 1920 df_SB = df_SB[(df_SB['Year'] < 1900) | (df_SB['Year'] > 1920)] df_Ibk = df_Ibk[(df_Ibk['Year'] < 1900) | (df_Ibk['Year'] > 1920)] #Filter out rows with year greater than 2023 df_SB = df_SB[(df_SB['Year'] < 2023)] df_Ibk = df_Ibk[(df_Ibk['Year'] < 2023)] #set all -1 values at precipitation and snowhight to NAs df_SB['schnee'] = df_SB['schnee'].replace(-1, np.nan) df_SB['nied'] = df_SB['nied'].replace(-1, np.nan) df_Ibk['nied'] = df_Ibk['nied'].replace(-1, np.nan) df_Ibk['schnee'] = df_Ibk['schnee'].replace(-1, np.nan) |
1 2 3 4 5 6 7 8 9 10 11 12 13 | # Calculate average mean temperature for each month for both stations mean_temp_Ibk = df_Ibk.groupby('Month')['t'].mean() mean_temp_SB = df_SB.groupby('Month')['t'].mean() # Plot the annual cycles for both stations plt.figure(figsize=(10, 6)) plt.plot(mean_temp_Ibk.index, mean_temp_Ibk.values, label='Innsbruck') plt.plot(mean_temp_SB.index, mean_temp_SB.values, label='Sonnblick') plt.xlabel('Month') plt.ylabel('Mean Temperature') plt.title('Average Annual Cycle of Mean Temperature') plt.legend() plt.show() |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | import datetime from scipy import stats # Calculate the average annual mean temperature for each year for the subset data mean_temp_Ibk_yearly = df_Ibk.resample('Y')['t'].mean() mean_temp_SB_yearly = df_SB.resample('Y')['t'].mean() # Plot the trend of mean temperature over time for both stations using the subset data plt.figure(figsize=(10, 6)) plt.plot(mean_temp_Ibk_yearly.index.year, mean_temp_Ibk_yearly.values, label='Innsbruck') plt.plot(mean_temp_SB_yearly.index.year, mean_temp_SB_yearly.values, label='Sonnblick') plt.xlabel('Year') plt.ylabel('Mean Temperature') plt.title('Trend of Mean Temperature Over Time') plt.legend() # Calculate trendlines using linregress for the subset data slope_Ibk, intercept_Ibk, _, _, _ = stats.linregress(mean_temp_Ibk_yearly.index.year, mean_temp_Ibk_yearly.values) trendline_Ibk = slope_Ibk * mean_temp_Ibk_yearly.index.year + intercept_Ibk slope_SB, intercept_SB, _, _, _ = stats.linregress(mean_temp_SB_yearly.index.year, mean_temp_SB_yearly.values) trendline_SB = slope_SB * mean_temp_SB_yearly.index.year + intercept_SB # Plot trendlines plt.plot(mean_temp_Ibk_yearly.index.year, trendline_Ibk, 'r--', label='Innsbruck Trend') plt.plot(mean_temp_SB_yearly.index.year, trendline_SB, 'r--', label='Sonnblick Trend') plt.legend() plt.show() |
1 2 | print(slope_Ibk) print(slope_SB) |
0.02527845594224632 0.022048975098567843
Due to the shown graph above, which displays the trend of the mean temperature in Innsbruck and on Sonnblick, we can see that on both places the temperature is increasing. At first glance the two trendlines, marked in rend, seem to have the same slope steepness, which would indicate that climate changes in the same rate, regardless of the altitude. By printing the slopes of the trendlines, one can clearly see that, the slope of the increase of mean temperature in Innsbruck, is indeed a bit steeper. This could indicate that a city located at around 570 m a.s.l is getting warmer than a mountain on an altitude of 3100m.
However this is a linear regression model, which only represents climate change in a simplified manner.
1 2 3 4 5 6 7 8 9 10 11 12 13 | # Calculate the average annual snow height for each year snow_height_Ibk_yearly = df_Ibk.resample('Y')['schnee'].mean() snow_height_SB_yearly = df_SB.resample('Y')['schnee'].mean() # Plot the snow height over time for both stations plt.figure(figsize=(10, 6)) plt.plot(snow_height_Ibk_yearly.index.year, snow_height_Ibk_yearly.values, label='Innsbruck') plt.plot(snow_height_SB_yearly.index.year, snow_height_SB_yearly.values, label='Sonnblick') plt.xlabel('Year') plt.ylabel('Snow Height in cm') plt.title('Change in Snow Cover Over Time') plt.legend() plt.show() |
In this graph we can see, that over the recorded timespans there is not really a recognicable trend of decreasing snow fall, neither in the city nor on the mountain. However in the next plot, which shows the maximum measured snowhight on Sonnblick, one can see that since the 1990s the maximum measurable hight of 500cm is not reached as often as in earlier years.
1 2 3 4 5 6 7 8 9 10 11 12 | # Plot the maximum snow height over time for df_SB max_snow_height_SB_yearly = df_SB.resample('Y')['schnee'].max() plt.figure(figsize=(10, 6)) plt.plot(max_snow_height_SB_yearly.index.year, max_snow_height_SB_yearly.values, label='Sonnblick', color='lightblue') plt.xlabel('Year') plt.ylabel('Max Snow Height in cm') plt.title('Maximum Snowheight on Sonnblick') plt.legend() plt.show() |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | # Filter the data for the desired time periods df_Ibk_1970_2022 = df_Ibk.loc['1970-01-01':'2022-12-31'] df_Ibk_1920_1969 = df_Ibk.loc['1920-01-01':'1969-12-31'] df_SB_1970_2022 = df_SB.loc['1970-01-01':'2022-12-31'] df_SB_1920_1969 = df_SB.loc['1920-01-01':'1969-12-31'] # Calculate the average monthly temperature for each period mean_temp_1970_2022 = df_Ibk_1970_2022.groupby(df_Ibk_1970_2022.index.month)['t'].mean() mean_temp_1920_1969 = df_Ibk_1920_1969.groupby(df_Ibk_1920_1969.index.month)['t'].mean() mean_SB_1970_2022 = df_SB_1970_2022.groupby(df_SB_1970_2022.index.month)['t'].mean() mean_SB_1920_1969 = df_SB_1920_1969.groupby(df_SB_1920_1969.index.month)['t'].mean() # Plot the annual cycle of temperature months = range(1, 13) plt.figure(figsize=(10, 6)) plt.plot(months, mean_temp_1970_2022, label='1970-2022 Innsbruck', color = 'b') plt.plot(months, mean_temp_1920_1969, label='1920-1969', color = 'b', linestyle='--' ) plt.plot(months, mean_SB_1970_2022, label='1970-2022 Sonnblick', color = 'orange') plt.plot(months, mean_SB_1920_1969, label='1920-1969', color = 'orange', linestyle='--' ) plt.xlabel('Month') plt.ylabel('Mean Temperature in °C') plt.title('Mean Temperature ') plt.legend() plt.xticks(months) plt.show() |
Since the 1970s, the Earth has been experiencing global warming. The graph shown compares the average temperatures between 1920-1969 and 1979-2022 at two stations: Innsbruck and Sonnblick. The biggest differences are observed in the months with extremely cold and warm temperatures (January, February, June, July, August, December).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | # Filter the data for the desired time periods df_Ibk_1970_2022 = df_Ibk.loc['1970-01-01':'2022-12-31'] df_Ibk_1920_1969 = df_Ibk.loc['1920-01-01':'1969-12-31'] # Calculate the average monthly temperature for each period mean_temp_1970_2022 = df_Ibk_1970_2022.groupby(df_Ibk_1970_2022.index.month)['t'].mean() mean_temp_1920_1969 = df_Ibk_1920_1969.groupby(df_Ibk_1920_1969.index.month)['t'].mean() # Plot the annual cycle of temperature months = range(1, 13) plt.figure(figsize=(10, 6)) plt.plot(months, mean_temp_1920_1969, label='1920-1969 (Reference)', color='black', linestyle='--') for group in range(1973, 2022, 10): start_year = group end_year = group + 10 if end_year <= 2022: df_group = df_Ibk.loc[f'{start_year}-01-01':f'{end_year}-12-31'] mean_temp_group = df_group.groupby(df_group.index.month)['t'].mean() plt.plot(months, mean_temp_group, label=f'{start_year}-{end_year}', alpha=0.5) df_group = df_Ibk.loc['2013-01-01' : '2022-12-31'] mean_temp_group = df_group.groupby(df_group.index.month)['t'].mean() plt.plot(months, mean_temp_group, label='2013- 2022', alpha=0.5) plt.xlabel('Month') plt.ylabel('Mean Temperature') plt.title('Annual Cycle of Temperature') plt.legend() plt.xticks(months) plt.show() |
The graph clearly demonstrates that global warming has become significantly more pronounced in recent years, particularly since 2000.
Now we want to examine the changes in Precipitation over the 100 year time period and compare the changes between our two stations. To make the changes rather presentable and visible, we chose to group the data into 4 time periods (each over 25 years) and then compare the two stations side-by-side with a Bar Blot. This plot brings a direct comparison of the precipitation amounts at each station for the same time period.
1 2 3 4 5 6 | # Divide the time into four sections (each 25 years) time_sections = np.array_split(df_Ibk.index.year.unique(), 4) # Calculate average precipitation for each section and station avg_precip_Ibk = [df_Ibk.loc[(df_Ibk.index.year >= section.min()) & (df_Ibk.index.year <= section.max()), 'nied'].mean() for section in time_sections] avg_precip_SB = [df_SB.loc[(df_SB.index.year >= section.min()) & (df_SB.index.year <= section.max()), 'nied'].mean() for section in time_sections] |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # Calculate total precipitation per time section (assuming 365 days in a year) total_precip_Ibk = [avg * 365 for avg in avg_precip_Ibk] total_precip_SB = [avg * 365 for avg in avg_precip_SB] # Create bar plot bar_width = 0.35 index = np.arange(len(time_sections)) fig, ax = plt.subplots(figsize=(10, 6)) rects1 = ax.bar(index, total_precip_Ibk, bar_width, label='Innsbruck') rects2 = ax.bar(index + bar_width, total_precip_SB, bar_width, label='Hoher Sonnblick') # Add labels, title, and legend ax.set_xlabel('Time Sections') ax.set_ylabel('Average Precipitation (mm/year)') ax.set_title('Average Precipitation Comparison between Innsbruck and Hoher Sonnblick') ax.set_xticks(index + bar_width / 2) ax.set_xticklabels(['1921-1946', '1947-1972', '1973-1997', '1998-2022']) ax.legend() plt.show() |
In the graph above we can see, that the average precipitation of Innsbruck almost remained the same in the last 100 years for each time section. But in contrast to that for the station at Hoher Sonnblick the average precipitation was constantly increasing with every time period significantly.
To visualize and to validate this interpretation even more, we tried to visualize the average precipitation once more with a area plot.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | # Calculate the average precipitation for each station average_precip_Ibk = df_Ibk['nied'].resample('Y').mean() average_precip_SB = df_SB['nied'].resample('Y').mean() # Create the area plot plt.figure(figsize=(10, 6)) plt.fill_between(average_precip_Ibk.index, average_precip_Ibk, label='Innsbruck', color='blue', alpha=0.5) plt.fill_between(average_precip_SB.index, average_precip_SB, label='Sonnblick', color='green', alpha=0.5) # Set labels and title plt.xlabel('Time') plt.ylabel('Average Annual Precipitation (mm)') plt.title('Comparison of Average Annual Precipitation: Innsbruck vs Sonnblick') plt.legend() plt.show() |
This plot shows once more that the average annual precipitation barely changed for Innsbruck but increased at Sonnblick over the last 100 years.
To get a more detailed information about the change in precipitation, we decided to differentiate the precipitation data into seasons. But we figured that it would maybe get a little confusing if we made a plot including all for seasons of the year. Therefore we splitted the year into two seasons: Summer(from May till October) and Winter (from November till April). To avoid confusion we also plotted the data seperately for our two stations.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | # Create a copy of the DataFrame with the required time period df_Ibk_seasons = df_Ibk['1922':'2022'].copy() df_SB_seasons = df_SB['1922':'2022'].copy() # Assign seasons based on the month df_Ibk_seasons['Season'] = df_Ibk_seasons.index.month.map({11: 'Winter', 12: 'Winter', 1: 'Winter', 2: 'Winter', 3: 'Winter', 4: 'Winter', 5: 'Summer', 6: 'Summer', 7: 'Summer', 8: 'Summer', 9: 'Summer', 10: 'Summer'}) df_SB_seasons['Season'] = df_SB_seasons.index.month.map({11: 'Winter', 12: 'Winter', 1: 'Winter', 2: 'Winter', 3: 'Winter', 4: 'Winter', 5: 'Summer', 6: 'Summer', 7: 'Summer', 8: 'Summer', 9: 'Summer', 10: 'Summer'}) # Calculate the average precipitation for each season and year average_precip_Ibk_seasons = df_Ibk_seasons.groupby(['Season', df_Ibk_seasons.index.year])['nied'].mean() average_precip_SB_seasons = df_SB_seasons.groupby(['Season', df_SB_seasons.index.year])['nied'].mean() # Reshape the data for plotting data_Ibk = average_precip_Ibk_seasons.unstack(level=0) data_SB = average_precip_SB_seasons.unstack(level=0) # Create the line plot for Innsbruck plt.figure(figsize=(10, 6)) plt.plot(data_Ibk.index, data_Ibk['Winter'], label='Winter', color='blue') plt.plot(data_Ibk.index, data_Ibk['Summer'], label='Summer', color='red') # Set labels and title for Innsbruck plot plt.xlabel('Time') plt.ylabel('Average Precipitation (mm/day)') plt.title('Average Precipitation Change: Innsbruck (Winter vs Summer)') plt.legend() plt.show() # Create the line plot for Sonnblick plt.figure(figsize=(10, 6)) plt.plot(data_SB.index, data_SB['Winter'], label='Winter', color='blue') plt.plot(data_SB.index, data_SB['Summer'], label='Summer', color='red') # Set labels and title for Sonnblick plot plt.xlabel('Time') plt.ylabel('Average Precipitation (mm/day)') plt.title('Average Precipitation Change: Hoher Sonnblick (Winter vs Summer)') plt.legend() plt.show() |
The graph for Innsbruck shows once more that the averag precipitation almost did not change for the last 100 years. But we can clearly see that the average precipitation in the summer-months is significantly and constantly higher than in the winter-months. Contrary to the graph of Sonnblick, where the difference of the precipitation between the seasons has no clear pattern and varies from year to year. But the average precipitation over time is increasing.