Programming project¶
Preamble¶
- Names: David Übertsroider, Luis Wolf
- Matrikelnummer: 12216131, 12211064
01 - An energy balance model with hysteresis¶
At first we import all the packages¶
1 2 3 4 | import numpy as np import pandas as pd import matplotlib.pyplot as plt import scipy.stats |
01-01:
This following code computes the parameters a and b so the equation $
\alpha =
\begin{cases}
0.3,& \text{if } T \gt 280\\
0.7,& \text{if } T \lt 250\\
a T + b, & \text{otherwise}
\end{cases}
$
is continuous at T = 250K and T = 280K
1 2 3 4 | a = np.array ([[280,1],[250,1]]) y = np.array ([[0.3],[0.7]]) a_inv = np.linalg.inv(a) a,b = a_inv @ y |
01-02:
The function alpha_from_temperature returns the corresponding albedo
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | def alpha_from_temperature(T): '''This function computes the albedo based on a given temperature Parameters ---------- T (float): Temperature value Returns ------- The computet albedo value (float) Examples -------- >>> alpha_from_temperature (300) 0.3 >>> alpha_from_temperature (230) 0.7 >>> np.isclose(alpha_from_temperature(260),0.5666666666) True ''' if T > 280: return 0.3 elif T < 250: return 0.7 else: return float(a*T+b) import doctest doctest.testmod() |
TestResults(failed=0, attempted=3)
01-03:
The function temperature_change_with_hysteresis returns the stabilization temperature dependent on the starting temperature, the albedo and tau
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | def temperature_change_with_hysteresis(t0, n_years, tau=0.611): # These are the constants s0 = 1362 C = 4.0e+08 sigma = 5.67E-8 # Convert the time and create datacontainers dt = 60 * 60 * 24 * 365 years = np.arange(n_years + 1) temperature = np.zeros(n_years + 1) temperature[0] = t0 for i in range(n_years): olr = tau * sigma * temperature[i]**4 asr = (1 - alpha_from_temperature(temperature[i])) * s0 / 4 temperature[i + 1] = temperature[i] + (dt / C) * (asr - olr) return temperature |
1 | temperature_change_with_hysteresis(292,100)[-1] |
288.0034709342971
1 | temperature_change_with_hysteresis(265,100)[-1] |
233.02581869749855
01-04:
Here we show the plot of a climate change scenario with hysteresis using our functions above
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | N_simulations = 21 n_years = 50 # create start values start_values = np.linspace(206, 318, N_simulations) # create an storage for the results df_results = pd.DataFrame (index = np.arange(n_years+1), columns = ['Start Values']) # loop the values in the function to get the results for t0 in (start_values): results = temperature_change_with_hysteresis(t0, n_years) df_results[t0] = results # plot it fig, ax = plt.subplots() ax.plot(df_results) ax.set_title('Climate change scenario with hysteresis') ax.set_xlabel('Years') ax.set_ylabel('Temperature (K)') plt.show() |
02 - Weather station data files¶
Open the data and display its content:
1 2 | df = pd.read_csv('INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True) df = df.drop('station', axis=1) |
02-01: after reading the documentation of the respective functions (and maybe try a few things yourself), explain in plain sentences:
- what am I asking pandas to do with the
index_col=1, parse_dates=True
keyword arguments? Why am I doing this? - what am I asking pandas to do with
.drop()
? Whyaxis=1
?
A02-01:
You are telling pandas to use the second collumn with the arguemtn "index_col=1" (0 would be the first collumn).
With "parse_dates=True", you are instructing pandas to try to read the data from the collumns as dares. Pandas will now automatically try to convert the date strings into datetime objects.
One does this to work more easliy with time related Data. One can now perfrom date-based operations more easily, e.g. plotting time series.
".drop()" lets you remove a row or a collumn from your DataFrame. With "axis=1" you can remove a collumn. If you write "axis=0", you can remove a row.
Now let me do something else from you:
1 2 | dfmeta = pd.read_csv('ZEHNMIN Parameter-Metadaten.csv', index_col=0) dfmeta.loc[df.columns] |
Kurzbeschreibung | Beschreibung | Einheit | |
---|---|---|---|
DD | Windrichtung | Windrichtung, vektorielles Mittel über 10 Minuten | ° |
FF | vektorielle Windgeschwindigkeit | Windgeschwindigkeit, vektorielles Mittel über ... | m/s |
GSX | Globalstrahlung | Globalstrahlung, arithmetisches Mittel über 10... | W/m² |
P | Luftdruck | Luftdruck, Basiswert zur Minute10 | hPa |
RF | Relative Feuchte | Relative Luftfeuchte, Basiswert zur Minute10 | % |
RR | Niederschlag | 10 Minuten Summe des Niederschlags, Summe der ... | mm |
SO | Sonnenscheindauer | Sonnenscheindauer, Sekundensumme über 10 Minuten | s |
TB1 | Erdbodentemperatur in 10cm Tiefe | Erdbodentemperatur in 10cm Tiefe, Basiswert zu... | °C |
TB2 | Erdbodentemperatur in 20cm Tiefe | Erdbodentemperatur in 20cm Tiefe, Basiswert zu... | °C |
TB3 | Erdbodentemperatur in 50cm Tiefe | Erdbodentemperatur in 50cm Tiefe, Basiswert zu... | °C |
TL | Lufttemperatur in 2m | Lufttemperatur in 2m Höhe, Basiswert zur Minute10 | °C |
TP | Taupunkt | Taupunktstemperatur, Basiswert zur Minute10 | °C |
02-02:
again, explain in plain sentences what the dfmeta.loc[df.columns]
is doing, and why it works that way.
A02-02:
It retrives the labels pf the collumn in the DataFrame and retuns them as a list of names.
The "loc" index allows you to select rows or columns with their label.
"df.columns" reads the labels of the DataFrame and returns a list of names from the columns. "dfmeta.loc[df.columns]" takes this list of names from the column to select the rows from the DataFrame dfmeta. It tries to match the column names in dfmeta with the row labels of dfmeta and selects the corresponding rows.
Baisically, it enables you to subset rows of your DataFrame basend on the columns which are in your DataFrame.
Finally, let me do a last step for you before you start coding:
1 | dfh = df.resample('H').mean() |
02-03:
explore the dfh
dataframe. Explain, in plain words, what the purpose of .resample('H')
followed by mean()
is. Explain what .resample('H').max()
and .resample('H').sum()
would do.
A02-03:
Python resamples the data in a different frequenzy. The "H" stands for a hourly frequenz, so that the initial data is grouped into hourly intervals.
The "mean()" function calculates the average of the values within the hourly intervals.
".resample("H").max()" would give you the maximum value within this intervall and ".resample("H").min()" would give you the minimum value within this interval.
02-04:
Using np.allclose
, make sure that the average of the first hour (that you'll compute yourself from df
) is indeed equal to the first row of dfh
. Now, two variables in the dataframe have units that aren't suitable for averaging. Please convert the following variables to the correct units: ##
RR
needs to be converted from the average of 10 min sums to mm/hSO
needs to be converted from the average of 10 min sums to s/h
1 2 3 4 5 6 7 8 9 10 11 12 13 | # Extract data for the first 60 minuntes first_hour_data = df.iloc[:6] # Compute the average of the first hour average_first_hour = first_hour_data.mean() # Comparing with the first hour of dfh is_close = np.allclose(average_first_hour, dfh.iloc[0]) if is_close: print("The average of the first hour matches the first row of dfh.") else: print("The average of the first hour does not match the first row of dfh.") |
The average of the first hour matches the first row of dfh.
1 2 | dfh["RR"] = dfh["RR"] / 10 * 60 dfh["SO"] = dfh["SO"] / 10 * 60 |
1 | dfh
|
DD | FF | GSX | P | RF | RR | SO | TB1 | TB2 | TB3 | TL | TP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
time | ||||||||||||
2015-01-01 00:00:00 | 257.333333 | 1.383333 | 0.0 | 965.666667 | 97.000000 | 0.0 | 0.0 | 3.200000 | 3.600000 | 5.2 | -2.533333 | -3.066667 |
2015-01-01 01:00:00 | 160.666667 | 0.766667 | 0.0 | 965.966667 | 97.666667 | 0.0 | 0.0 | 3.150000 | 3.600000 | 5.2 | -3.250000 | -3.716667 |
2015-01-01 02:00:00 | 227.000000 | 0.816667 | 0.0 | 966.316667 | 98.500000 | 0.0 | 0.0 | 3.100000 | 3.600000 | 5.2 | -3.583333 | -3.933333 |
2015-01-01 03:00:00 | 258.833333 | 1.933333 | 0.0 | 966.466667 | 97.000000 | 0.0 | 0.0 | 3.100000 | 3.600000 | 5.2 | -3.250000 | -3.766667 |
2015-01-01 04:00:00 | 241.500000 | 1.266667 | 0.0 | 966.416667 | 95.166667 | 0.0 | 0.0 | 3.100000 | 3.600000 | 5.2 | -3.116667 | -3.883333 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-12-31 19:00:00 | 283.500000 | 1.166667 | 0.0 | 957.766667 | 98.833333 | 0.0 | 0.0 | 3.316667 | 3.300000 | 3.4 | 0.900000 | 0.716667 |
2021-12-31 20:00:00 | 237.333333 | 0.716667 | 0.0 | 958.100000 | 99.500000 | 0.0 | 0.0 | 3.283333 | 3.300000 | 3.4 | -0.200000 | -0.350000 |
2021-12-31 21:00:00 | 149.833333 | 1.083333 | 0.0 | 958.783333 | 99.666667 | 0.0 | 0.0 | 3.200000 | 3.300000 | 3.4 | -0.800000 | -0.916667 |
2021-12-31 22:00:00 | 175.833333 | 0.516667 | 0.0 | 959.433333 | 99.166667 | 0.0 | 0.0 | 3.100000 | 3.383333 | 3.4 | -1.083333 | -1.200000 |
2021-12-31 23:00:00 | 198.833333 | 0.600000 | 0.0 | 960.016667 | 99.500000 | 0.0 | 0.0 | 3.033333 | 3.400000 | 3.4 | -1.683333 | -1.716667 |
61368 rows × 12 columns
Spend some time exploring the dfh
dataframe we just created. What time period does it cover? What variables does it contain?
The DataFrame reaches from the first of January 2015 till the last day of December 2021 and contains the following variables: Wind direction, wind speed in vectors, global radiation, air pressure, relative humidity, precipitation, sunshine duration, Ground temperature at 10cm 20cm and 50cm depth, air temperature at 2cm height and finally the dew point temperature.
From now on, we will use the hourly data only (and further aggregations when necessary). The 10 mins data are great but require a little bit more of pandas kung fu (the chinese term, not the sport) to be used efficiently.
03 - Precipitation¶
03-01:
Compute the average annual precipitation (m/year) over the 7-year period.
1 2 3 4 | # calculate the sum and mean of RR annual_rr_precipitation = dfh["RR"].sum().mean()/100 print("Average Annual Precipitation (RR): ",round(annual_rr_precipitation,2)," m/year") |
Average Annual Precipitation (RR): 64.44 m/year
03-02:
What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?
1 2 3 4 5 6 7 8 9 10 11 12 | # find the minimum of non zero precipitation and max hourly precipitation smallest_non_zero_precipitation = dfh[dfh["RR"] > 0]["RR"].min() max_hourly_precipitation = dfh["RR"].max() # find the time using idxmax and idxmin time_max_hourly_precipitation = dfh["RR"].idxmax() time_min_hourly_precipitation = dfh["RR"][dfh["RR"]>0].idxmin() print("Smallest non-zero precipitation: ",round(smallest_non_zero_precipitation, 3),"mm") print("Maximum hourly precipitation: ",round(max_hourly_precipitation,2)," mm") print("Timestamp for minimum hourly precipitation:", time_min_hourly_precipitation) print("Timestamp for maximum hourly precipitation:", time_max_hourly_precipitation) |
Smallest non-zero precipitation: 0.1 mm Maximum hourly precipitation: 22.2 mm Timestamp for minimum hourly precipitation: 2015-01-02 16:00:00 Timestamp for maximum hourly precipitation: 2021-09-16 22:00:00
03-03:
Plot a histogram of hourly precipitation, with bins of size 0.2 mm/h, starting at 0.1 mm/h and ending at 25 mm/h. Plot the same data, but this time with a logarithmic y-axis. Compute the 99th percentile (or quantile) of hourly precipitation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # histogram linear y-axis plt.figure(figsize=(10,5)) plt.hist(dfh["RR"],bins=(125),range=(0.1, 25), edgecolor="k", color="mediumblue") plt.xlabel("Hourly Precipitation (mm/h)") plt.ylabel("Frequency") plt.title("Hourly Precipitation") plt.grid() plt.show() #histogram logarithmic y-axis plt.figure(figsize=(10,5)) plt.hist(dfh["RR"], log=True, bins=125, range=(0.1, 25), edgecolor="k", color="mediumblue") plt.xlabel("Hourly Precipitation (mm/h)") plt.ylabel("Frequency (log scale)") plt.title("Hourly Precipitation with Logarithmic Y-axis") plt.grid() plt.show() # 99th percentile of hourly precipitation percentile_99 = dfh["RR"].quantile(0.99) print("99th percentile of hourly precipitation:",round(percentile_99,2),"mm/h") |
99th percentile of hourly precipitation: 2.33 mm/h
03-04:
Compute daily sums (mm/d) of precipitation (tip: use .resample
again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).
1 2 3 4 5 6 7 | # resample to daily sums daily_precipitation = dfh["RR"].resample("D").sum() # sum and average of raindays average_rain_days_per_year = (daily_precipitation >= 0.1).sum()/7 print("Average number of rain days per year in Innsbruck:", round(average_rain_days_per_year)) |
Average number of rain days per year in Innsbruck: 159
03-05:
Now select (subset) the daily dataframe to keep only only daily data in the months of December, January, February (DFJ). To do this, note that dfh.index.month
exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).
1 2 3 4 5 6 7 8 9 10 11 | # Select the DJF subset. Dezember till February 12-2 djf = daily_precipitation[(daily_precipitation.index.month >= 12)|(daily_precipitation.index.month <= 2)] # computing the average precipitation of DJF average_precipitation_djf = djf.mean() # counting the days on which it rained. Divided by 7 because of 7 years average_rainy_days_djf = (djf >= 0.1).sum()/7 print("Average precipitation in DJF:",round(average_precipitation_djf,2),"mm/d") print("Average number of rainy days in DJF:",round(average_rainy_days_djf)) |
Average precipitation in DJF: 1.77 mm/d Average number of rainy days in DJF: 36
1 2 3 4 5 6 7 8 9 10 11 | # Select the JJA subset. June till August 12-2 jja = daily_precipitation[(daily_precipitation.index.month >= 6)&(daily_precipitation.index.month <= 8)] # computing the average precipitation of JJA average_precipitation_jja = jja.mean() # counting the days on which it rained. Divided by 7 because of 7 years average_rainy_days_jja = (jja >= 0.1).sum()/7 print("Average precipitation in JJA:",round(average_precipitation_jja,2),"mm/d") print("Average number of rainy days in JJA:",round(average_rainy_days_jja)) |
Average precipitation in JJA: 4.02 mm/d Average number of rainy days in JJA: 50
03-06:
Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.
1 2 3 4 5 6 | # Subsets for djf and jja djf = dfh[(dfh.index.month >= 12) | (dfh.index.month <= 2)] jja = dfh[(dfh.index.month >= 6) & (dfh.index.month <= 8)] # prints and counts print("DJF hourly over 99th Percentile:",len(djf["RR"][djf["RR"]>percentile_99]),"%") print("JJA hourly over 99th Percentile:",len(jja["RR"][jja["RR"]>percentile_99]),"%") |
DJF hourly over 99th Percentile: 68 % JJA hourly over 99th Percentile: 308 %
03-07:
Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA. I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour
exists and can be used to categorize data. Then, note that df.groupby
exists and can be used exactly for that (documentation).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # Compute the average daily cycles for djf and jja djf_daily_cycle = djf.groupby(djf.index.hour)["RR"].mean() jja_daily_cycle = jja.groupby(jja.index.hour)["RR"].mean() # Plotting fig, ax = plt.subplots(figsize=(9, 5)) ax.plot(djf_daily_cycle, label="DJF", color="limegreen") ax.plot(jja_daily_cycle, label="JJA", color="tomato") ax.set_title("Average Daily Cycle of Hourly Precipitation") ax.set_ylabel("Average Hourly Precipitation (mm/h)") ax.set_xlabel("Hour of the Day") ax.legend() ax.grid() plt.show() |
04 - A few other variables¶
04-01:
In the first part of the code here, we show that the average temperature of all three soil temperatures are nearly the same and print it out.
Next we plot the three soil temperatures in an hourly resolution for the year 2020.
At the end of the code we plot the hree soil temperatures in an hourly resolution for may 2020
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | df = pd.read_csv('INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True) df = df.drop('station', axis=1) mean_TB1 = df['TB1'].mean() mean_TB2 = df['TB2'].mean() mean_TB3 = df['TB3'].mean() # compare the average values if np.isclose(mean_TB1,mean_TB2,mean_TB3): print('The three soil temperatures have approximately the same average value over the entire period') else: print('The three soil temperatures have not the same average value over the entire period') # filter for the full hour data df_hourly = df.groupby(pd.Grouper(freq='H')).mean() # filter for the data in 2020 df_hourly_2020 = df_hourly.loc[df_hourly.index.year==2020] # plot of the soil temperature in hourly resolution for the year 2020 fig, ax = plt.subplots() df_hourly_2020['TB1'].plot(label='TB1') df_hourly_2020['TB2'].plot(label='TB2') df_hourly_2020['TB3'].plot(label='TB3') ax.set_title('Soil temperature 2020') ax.set_xlabel('Time') ax.set_ylabel('Temperature (°C)') ax.legend() # filter for the data for may 2020 df_may_2020 = df_hourly_2020.loc[df_hourly_2020.index.month==5] # plot of the soil temperature in hourly resolution for may 2020 fig, ax = plt.subplots() df_may_2020['TB1'].plot(label='TB1') df_may_2020['TB2'].plot(label='TB2') df_may_2020['TB3'].plot(label='TB3') ax.set_title('Soil temperature may 2020') ax.set_xlabel('Time') ax.set_ylabel('Temperature (°C)') ax.legend(loc='lower right') plt.show() |
The three soil temperatures have approximately the same average value over the entire period
04-02:
Here we plot the daily average temperature of the three soil temperatures (TB1, TB2 and TB3) over our whole dataframe
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # filter for the daily average df_daily = df.groupby(pd.Grouper(freq='D')).mean() # plot the daily average fig, ax = plt.subplots() df_daily['TB1'].plot(label='TB1') df_daily['TB2'].plot(label='TB2') df_daily['TB3'].plot(label='TB3') ax.set_title('Average daily cycle of soil temperature') ax.set_xlabel('Time') ax.set_ylabel('Temperature (°C)') ax.legend() plt.show() |
04-03:
Here we compute the difference between the air temperature and the dewpoint temperature and show it on a scatterplot
1 2 3 4 5 6 7 8 9 10 11 | # Compute the difference between TL and TP df['Temperature_Difference'] = df['TL'] - df['TP'] # Plotting the temperature difference on a scatter plot fig, ax = plt.subplots() ax.scatter(x=df['RF'], y=df['Temperature_Difference'], alpha=0.2, color='cadetblue') ax.set_xlabel('Relative Humidity') ax.set_ylabel('Temperature Difference (°C)') ax.set_title('Temperature Difference and Relative Humidity') plt.show() |
05 - Free coding project¶
Compareing temperature datas from all state capitals of austria
For our project we decided to compare the temperature of all state capitals of austria since 1900 (or the earliest date since 1900 where data exists) until 2022. We got the data from the ZAMG datahub in hourly resolution and convertet it into yearly resolution.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | df = pd.read_csv('TAG Datensatz_19000101_20230601.csv', index_col=1, parse_dates=True) df = df['1900':'2022'] # split the dataframe into state capital dataframes df_ibk = df[df['station'] == 39] df_sbg = df[df['station'] == 6300] df_brgz = df[df['station'] == 15] df_linz = df[df['station'] == 56] df_st_poelt = df[df['station'] == 93] df_wien = df[df['station'] == 105] df_eisstdt = df[df['station'] == 22] df_graz = df[df['station'] == 30] df_klgft = df[df['station'] == 48] # define labels labels = ['Inssbruck', 'Salzburg', 'Bregenz', 'Linz', 'St.Pölten', 'Wien', 'Eisenstadt', 'Graz', 'Klagenfurt'] # filter the yearly data (because daily data makes no sence in this context) dfs = [df_ibk, df_sbg, df_brgz, df_linz, df_st_poelt, df_wien, df_eisstdt, df_graz, df_klgft] dfs_year = [] for i in dfs: elem = i.groupby(pd.Grouper(freq='Y')).agg({'t':'mean', 'nied':'sum'}) # make the average temperature of each year dfs_year.append(elem[(elem['t']>6) & (elem['t']<13)]) # and filter the invalid datas # plot fig, ax = plt.subplots(1) for df_, lbls in zip(dfs_year, labels): ax.plot(df_.index,df_['t'],label = lbls) ax.set_xlabel('Time\n from 1900 to 2022') ax.set_ylabel('Temperature(°C)') ax.legend() plt.grid() plt.show() |
Here we can se nothing in this plot so we decided to separate austria in three areas into northern side of the alpine ridge, southern side of the alpine ridge and eastern austria
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | # split up austria north = dfs_year[0:3] east = dfs_year[3:7] south = dfs_year[7:] # into regions regions = [north, east, south] fig, axs = plt.subplots(3,1,figsize = (15,10)) labels_regions = [labels[0:3], labels[3:7], labels[7:]] # create a list of titles for subplots titles = ['Northern alpine ridge', 'Eastern austria', 'Southern alpine ridge'] # loop over variables and plot for ax, region, lbsrgn, tlt in zip(axs, regions,labels_regions, titles): ax.grid() ax.set_title(tlt) for city, lbscty in zip(region, lbsrgn): ax.plot(city.index, city['t'], label = lbscty) ax.legend() ax.set_ylabel('Temperature(°C)') ax.set_xlabel('Time\n from 1900 to 2022') plt.show() |
We think that the trend for each state capital whould also be interesting and also easier to compare.
1 2 3 4 5 6 7 8 9 10 11 | # create lists for the linear regression function to stick values in slopes = [] intercepts = [] # calculate the linear regression for every state capital for df_, lbl in zip(dfs_year, labels): x = np.arange(0, len(df_.index.year)) y = df_['t'] slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(x, y) slopes.append(slope) intercepts.append(intercept) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | fig, axs = plt.subplots(3,3, figsize = (25,20)) for ax,lbl, slope, intercept in zip(axs.ravel(),labels, slopes, intercepts): df_city = dfs_year[labels.index(lbl)] x_city = np.arange(0, len(df_city.index.year)) y_city = df_city['t'] # Generate x values for the regression line x_regress = np.linspace(min(x_city), max(x_city), 100) # Calculate corresponding y values using the regression line equation y_regress = slope * x_regress + intercept x_label = df_city.index.year[::10] x_tick_positions = np.interp(x_label, df_city.index.year, x_city) ax.set_xscale('linear') ax.set_xticks(x_tick_positions) ax.set_xticklabels([str(int(label)) for label in x_label], fontsize = 9, rotation = 30) # Convert tick values to integer labels and rotate # Plot the data points and linear regression ax.scatter(x_city, y_city, label='Yearly averaged temperature',alpha = 0.3) ax.plot(x_regress, y_regress, color='red', label='Temperature trend') ax.text(0.2, 0.9, 'y = ' + str(round(slope, 3)) + ' * x + ' + str(round(intercept, 1)),fontsize=12, horizontalalignment='center', verticalalignment='center', transform=ax.transAxes) ax.set_title(lbl) ax.grid() ax.set_ylabel('Teperature in °C') ax.legend(loc = 'lower right') |
In this plots we can easily compare the diffrent trends off all state capitals. In the formula in the upper left we can read the yearly temperature rise from the gradient (first value) of the function. So we can see that the temperature in Klagenfurt has rised the most. But this comparison is not quite accurate, because some data start from 1900 and some start later. So let's look at all the data starting from 1950 and compare.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | df = df ['1950':'2022'] df_ibk = df[df['station'] == 39] df_sbg = df[df['station'] == 6300] df_brgz = df[df['station'] == 15] df_linz = df[df['station'] == 56] df_st_poelt = df[df['station'] == 93] df_wien = df[df['station'] == 105] df_eisstdt = df[df['station'] == 22] df_graz = df[df['station'] == 30] df_klgft = df[df['station'] == 48] labels = ['Inssbruck', 'Salzburg', 'Bregenz', 'Linz', 'St.Pölten', 'Wien', 'Eisenstadt', 'Graz', 'Klagenfurt'] # filter the yearly data (because daily data makes no sence in this context) dfs = [df_ibk, df_sbg, df_brgz, df_linz, df_st_poelt, df_wien, df_eisstdt, df_graz, df_klgft] dfs_year = [] for i in dfs: elem = i.groupby(pd.Grouper(freq='Y')).agg({'t':'mean', 'nied':'sum'}) dfs_year.append(elem[(elem['t']>6) & (elem['t']<13)]) # and filter the invalid datas slopes = [] intercepts = [] for df_, lbl in zip(dfs_year, labels): x = np.arange(0, len(df_.index.year)) y = df_['t'] slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(x, y) slopes.append(slope) intercepts.append(intercept) fig, axs = plt.subplots(3,3, figsize = (25,20)) for ax,lbl, slope, intercept in zip(axs.ravel(),labels, slopes, intercepts): df_city = dfs_year[labels.index(lbl)] x_city = np.arange(0, len(df_city.index.year)) y_city = df_city['t'] # Generate x values for the regression line x_regress = np.linspace(min(x_city), max(x_city), 100) # Calculate corresponding y values using the regression line equation y_regress = slope * x_regress + intercept x_label = df_city.index.year[::10] x_tick_positions = np.interp(x_label, df_city.index.year, x_city) ax.set_xscale('linear') ax.set_xticks(x_tick_positions) ax.set_xticklabels([str(int(label)) for label in x_label], fontsize = 9, rotation = 30) # Convert tick values to integer labels and rotate # Plot the data points and linear regression ax.scatter(x_city, y_city, label='Yearly averaged temperature',alpha = 0.3) ax.plot(x_regress, y_regress, color='red', label='Temperature trend') ax.text(0.2, 0.9, 'y = ' + str(round(slope, 3)) + ' * x + ' + str(round(intercept, 1)),fontsize=12, horizontalalignment='center', verticalalignment='center', transform=ax.transAxes) ax.set_title(lbl) ax.grid() ax.set_ylabel('Teperature in °C') ax.legend(loc = 'lower right') |
Here we look at all the data starting from 1950 and see that the temperature has rised the most in Innsbruck not in Klagenfurt. (in the timespan of 1950-2022)