Programming project¶

Preamble¶

  • Names: David Übertsroider, Luis Wolf
  • Matrikelnummer: 12216131, 12211064

01 - An energy balance model with hysteresis¶

At first we import all the packages¶

1
2
3
4
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats

01-01:

This following code computes the parameters a and b so the equation $ \alpha = \begin{cases} 0.3,& \text{if } T \gt 280\\ 0.7,& \text{if } T \lt 250\\ a T + b, & \text{otherwise} \end{cases} $
is continuous at T = 250K and T = 280K

1
2
3
4
a = np.array ([[280,1],[250,1]])
y = np.array ([[0.3],[0.7]])
a_inv = np.linalg.inv(a)
a,b = a_inv @ y

01-02:

The function alpha_from_temperature returns the corresponding albedo

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def alpha_from_temperature(T):
    
    '''This function computes the albedo based on a given temperature
    Parameters
    ----------
    T (float): Temperature value
    
    Returns
    -------
    The computet albedo value (float)
    
    Examples
    --------
    >>> alpha_from_temperature (300)
    0.3
    >>> alpha_from_temperature (230)
    0.7
    >>> np.isclose(alpha_from_temperature(260),0.5666666666)
    True
    '''
    if T > 280:
        return 0.3
    elif T < 250:
        return 0.7
    else:
        return float(a*T+b)
    
import doctest
doctest.testmod()
TestResults(failed=0, attempted=3)

01-03:

The function temperature_change_with_hysteresis returns the stabilization temperature dependent on the starting temperature, the albedo and tau

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
def temperature_change_with_hysteresis(t0, n_years, tau=0.611):

# These are the constants
    s0 = 1362
    C = 4.0e+08
    sigma = 5.67E-8
    
# Convert the time and create datacontainers      
    dt = 60 * 60 * 24 * 365
    years = np.arange(n_years + 1)
    temperature = np.zeros(n_years + 1)
    temperature[0] = t0
    
    for i in range(n_years):
        olr = tau * sigma * temperature[i]**4  
        asr = (1 - alpha_from_temperature(temperature[i])) * s0 / 4  

        temperature[i + 1] = temperature[i] + (dt / C) * (asr - olr)
       
    return  temperature 
1
temperature_change_with_hysteresis(292,100)[-1]
288.0034709342971
1
temperature_change_with_hysteresis(265,100)[-1]
233.02581869749855

01-04:

Here we show the plot of a climate change scenario with hysteresis using our functions above

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
N_simulations = 21
n_years = 50

# create start values
start_values = np.linspace(206, 318, N_simulations)

# create an storage for the results
df_results = pd.DataFrame (index = np.arange(n_years+1), columns = ['Start Values'])

# loop the values in the function to get the results 
for t0 in (start_values):
    results = temperature_change_with_hysteresis(t0, n_years) 
    df_results[t0] = results   

# plot it
fig, ax = plt.subplots()
ax.plot(df_results)
ax.set_title('Climate change scenario with hysteresis')
ax.set_xlabel('Years')
ax.set_ylabel('Temperature (K)')

plt.show()
No description has been provided for this image

02 - Weather station data files¶

Open the data and display its content:

1
2
df = pd.read_csv('INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True)
df = df.drop('station', axis=1)

02-01: after reading the documentation of the respective functions (and maybe try a few things yourself), explain in plain sentences:

  • what am I asking pandas to do with the index_col=1, parse_dates=True keyword arguments? Why am I doing this?
  • what am I asking pandas to do with .drop()? Why axis=1?

A02-01:

You are telling pandas to use the second collumn with the arguemtn "index_col=1" (0 would be the first collumn).

With "parse_dates=True", you are instructing pandas to try to read the data from the collumns as dares. Pandas will now automatically try to convert the date strings into datetime objects.

One does this to work more easliy with time related Data. One can now perfrom date-based operations more easily, e.g. plotting time series.

".drop()" lets you remove a row or a collumn from your DataFrame. With "axis=1" you can remove a collumn. If you write "axis=0", you can remove a row.

Now let me do something else from you:

1
2
dfmeta = pd.read_csv('ZEHNMIN Parameter-Metadaten.csv', index_col=0)
dfmeta.loc[df.columns]
Kurzbeschreibung Beschreibung Einheit
DD Windrichtung Windrichtung, vektorielles Mittel über 10 Minuten °
FF vektorielle Windgeschwindigkeit Windgeschwindigkeit, vektorielles Mittel über ... m/s
GSX Globalstrahlung Globalstrahlung, arithmetisches Mittel über 10... W/m²
P Luftdruck Luftdruck, Basiswert zur Minute10 hPa
RF Relative Feuchte Relative Luftfeuchte, Basiswert zur Minute10 %
RR Niederschlag 10 Minuten Summe des Niederschlags, Summe der ... mm
SO Sonnenscheindauer Sonnenscheindauer, Sekundensumme über 10 Minuten s
TB1 Erdbodentemperatur in 10cm Tiefe Erdbodentemperatur in 10cm Tiefe, Basiswert zu... °C
TB2 Erdbodentemperatur in 20cm Tiefe Erdbodentemperatur in 20cm Tiefe, Basiswert zu... °C
TB3 Erdbodentemperatur in 50cm Tiefe Erdbodentemperatur in 50cm Tiefe, Basiswert zu... °C
TL Lufttemperatur in 2m Lufttemperatur in 2m Höhe, Basiswert zur Minute10 °C
TP Taupunkt Taupunktstemperatur, Basiswert zur Minute10 °C

02-02:

again, explain in plain sentences what the dfmeta.loc[df.columns] is doing, and why it works that way.

A02-02:

It retrives the labels pf the collumn in the DataFrame and retuns them as a list of names.

The "loc" index allows you to select rows or columns with their label.

"df.columns" reads the labels of the DataFrame and returns a list of names from the columns. "dfmeta.loc[df.columns]" takes this list of names from the column to select the rows from the DataFrame dfmeta. It tries to match the column names in dfmeta with the row labels of dfmeta and selects the corresponding rows.

Baisically, it enables you to subset rows of your DataFrame basend on the columns which are in your DataFrame.

Finally, let me do a last step for you before you start coding:

1
dfh = df.resample('H').mean()

02-03:

explore the dfh dataframe. Explain, in plain words, what the purpose of .resample('H') followed by mean() is. Explain what .resample('H').max() and .resample('H').sum() would do.

A02-03:

Python resamples the data in a different frequenzy. The "H" stands for a hourly frequenz, so that the initial data is grouped into hourly intervals.

The "mean()" function calculates the average of the values within the hourly intervals.

".resample("H").max()" would give you the maximum value within this intervall and ".resample("H").min()" would give you the minimum value within this interval.

02-04:

Using np.allclose, make sure that the average of the first hour (that you'll compute yourself from df) is indeed equal to the first row of dfh. Now, two variables in the dataframe have units that aren't suitable for averaging. Please convert the following variables to the correct units: ##

  • RR needs to be converted from the average of 10 min sums to mm/h
  • SO needs to be converted from the average of 10 min sums to s/h
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Extract data for the first 60 minuntes
first_hour_data = df.iloc[:6]

# Compute the average of the first hour
average_first_hour = first_hour_data.mean()

# Comparing with the first hour of dfh
is_close = np.allclose(average_first_hour, dfh.iloc[0])

if is_close:
    print("The average of the first hour matches the first row of dfh.")
else:
    print("The average of the first hour does not match the first row of dfh.")
The average of the first hour matches the first row of dfh.
1
2
dfh["RR"] = dfh["RR"] / 10 * 60
dfh["SO"] = dfh["SO"] / 10 * 60
1
dfh
DD FF GSX P RF RR SO TB1 TB2 TB3 TL TP
time
2015-01-01 00:00:00 257.333333 1.383333 0.0 965.666667 97.000000 0.0 0.0 3.200000 3.600000 5.2 -2.533333 -3.066667
2015-01-01 01:00:00 160.666667 0.766667 0.0 965.966667 97.666667 0.0 0.0 3.150000 3.600000 5.2 -3.250000 -3.716667
2015-01-01 02:00:00 227.000000 0.816667 0.0 966.316667 98.500000 0.0 0.0 3.100000 3.600000 5.2 -3.583333 -3.933333
2015-01-01 03:00:00 258.833333 1.933333 0.0 966.466667 97.000000 0.0 0.0 3.100000 3.600000 5.2 -3.250000 -3.766667
2015-01-01 04:00:00 241.500000 1.266667 0.0 966.416667 95.166667 0.0 0.0 3.100000 3.600000 5.2 -3.116667 -3.883333
... ... ... ... ... ... ... ... ... ... ... ... ...
2021-12-31 19:00:00 283.500000 1.166667 0.0 957.766667 98.833333 0.0 0.0 3.316667 3.300000 3.4 0.900000 0.716667
2021-12-31 20:00:00 237.333333 0.716667 0.0 958.100000 99.500000 0.0 0.0 3.283333 3.300000 3.4 -0.200000 -0.350000
2021-12-31 21:00:00 149.833333 1.083333 0.0 958.783333 99.666667 0.0 0.0 3.200000 3.300000 3.4 -0.800000 -0.916667
2021-12-31 22:00:00 175.833333 0.516667 0.0 959.433333 99.166667 0.0 0.0 3.100000 3.383333 3.4 -1.083333 -1.200000
2021-12-31 23:00:00 198.833333 0.600000 0.0 960.016667 99.500000 0.0 0.0 3.033333 3.400000 3.4 -1.683333 -1.716667

61368 rows × 12 columns

Spend some time exploring the dfh dataframe we just created. What time period does it cover? What variables does it contain?

The DataFrame reaches from the first of January 2015 till the last day of December 2021 and contains the following variables: Wind direction, wind speed in vectors, global radiation, air pressure, relative humidity, precipitation, sunshine duration, Ground temperature at 10cm 20cm and 50cm depth, air temperature at 2cm height and finally the dew point temperature.

From now on, we will use the hourly data only (and further aggregations when necessary). The 10 mins data are great but require a little bit more of pandas kung fu (the chinese term, not the sport) to be used efficiently.

03 - Precipitation¶

03-01:

Compute the average annual precipitation (m/year) over the 7-year period.

1
2
3
4
# calculate the sum and mean of RR 
annual_rr_precipitation = dfh["RR"].sum().mean()/100

print("Average Annual Precipitation (RR): ",round(annual_rr_precipitation,2)," m/year")
Average Annual Precipitation (RR):  64.44  m/year

03-02:

What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# find the minimum of non zero precipitation and max hourly precipitation
smallest_non_zero_precipitation =  dfh[dfh["RR"] > 0]["RR"].min()
max_hourly_precipitation = dfh["RR"].max()

# find the time using idxmax and idxmin
time_max_hourly_precipitation = dfh["RR"].idxmax()
time_min_hourly_precipitation = dfh["RR"][dfh["RR"]>0].idxmin()

print("Smallest non-zero precipitation: ",round(smallest_non_zero_precipitation, 3),"mm")
print("Maximum hourly precipitation: ",round(max_hourly_precipitation,2)," mm")
print("Timestamp for minimum hourly precipitation:", time_min_hourly_precipitation)
print("Timestamp for maximum hourly precipitation:", time_max_hourly_precipitation)
Smallest non-zero precipitation:  0.1 mm
Maximum hourly precipitation:  22.2  mm
Timestamp for minimum hourly precipitation: 2015-01-02 16:00:00
Timestamp for maximum hourly precipitation: 2021-09-16 22:00:00

03-03:

Plot a histogram of hourly precipitation, with bins of size 0.2 mm/h, starting at 0.1 mm/h and ending at 25 mm/h. Plot the same data, but this time with a logarithmic y-axis. Compute the 99th percentile (or quantile) of hourly precipitation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# histogram linear y-axis
plt.figure(figsize=(10,5))
plt.hist(dfh["RR"],bins=(125),range=(0.1, 25), edgecolor="k", color="mediumblue")
plt.xlabel("Hourly Precipitation (mm/h)")
plt.ylabel("Frequency")
plt.title("Hourly Precipitation")
plt.grid()
plt.show()

#histogram logarithmic y-axis
plt.figure(figsize=(10,5))
plt.hist(dfh["RR"], log=True, bins=125, range=(0.1, 25), edgecolor="k", color="mediumblue")
plt.xlabel("Hourly Precipitation (mm/h)")
plt.ylabel("Frequency (log scale)")
plt.title("Hourly Precipitation with Logarithmic Y-axis")
plt.grid()
plt.show()

# 99th percentile of hourly precipitation
percentile_99 = dfh["RR"].quantile(0.99)
print("99th percentile of hourly precipitation:",round(percentile_99,2),"mm/h")
No description has been provided for this image
No description has been provided for this image
99th percentile of hourly precipitation: 2.33 mm/h

03-04:

Compute daily sums (mm/d) of precipitation (tip: use .resample again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).

1
2
3
4
5
6
7
# resample to daily sums
daily_precipitation = dfh["RR"].resample("D").sum()

# sum and average of raindays
average_rain_days_per_year = (daily_precipitation >= 0.1).sum()/7

print("Average number of rain days per year in Innsbruck:", round(average_rain_days_per_year))
Average number of rain days per year in Innsbruck: 159

03-05:

Now select (subset) the daily dataframe to keep only only daily data in the months of December, January, February (DFJ). To do this, note that dfh.index.month exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Select the DJF subset. Dezember till February 12-2
djf = daily_precipitation[(daily_precipitation.index.month >= 12)|(daily_precipitation.index.month <= 2)]

# computing the average precipitation of DJF
average_precipitation_djf = djf.mean()

# counting the days on which it rained. Divided by 7 because of 7 years
average_rainy_days_djf = (djf >= 0.1).sum()/7

print("Average precipitation in DJF:",round(average_precipitation_djf,2),"mm/d")
print("Average number of rainy days in DJF:",round(average_rainy_days_djf))
Average precipitation in DJF: 1.77 mm/d
Average number of rainy days in DJF: 36
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Select the JJA subset. June till August 12-2
jja = daily_precipitation[(daily_precipitation.index.month >= 6)&(daily_precipitation.index.month <= 8)]

# computing the average precipitation of JJA
average_precipitation_jja = jja.mean()

# counting the days on which it rained. Divided by 7 because of 7 years
average_rainy_days_jja = (jja >= 0.1).sum()/7

print("Average precipitation in JJA:",round(average_precipitation_jja,2),"mm/d")
print("Average number of rainy days in JJA:",round(average_rainy_days_jja))
Average precipitation in JJA: 4.02 mm/d
Average number of rainy days in JJA: 50

03-06:

Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.

1
2
3
4
5
6
# Subsets for djf and jja
djf = dfh[(dfh.index.month >= 12) | (dfh.index.month <= 2)]
jja = dfh[(dfh.index.month >= 6) & (dfh.index.month <= 8)]
# prints and counts
print("DJF hourly over 99th Percentile:",len(djf["RR"][djf["RR"]>percentile_99]),"%")
print("JJA hourly over 99th Percentile:",len(jja["RR"][jja["RR"]>percentile_99]),"%")
DJF hourly over 99th Percentile: 68 %
JJA hourly over 99th Percentile: 308 %

03-07:

Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA. I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour exists and can be used to categorize data. Then, note that df.groupby exists and can be used exactly for that (documentation).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Compute the average daily cycles for djf and jja
djf_daily_cycle = djf.groupby(djf.index.hour)["RR"].mean()
jja_daily_cycle = jja.groupby(jja.index.hour)["RR"].mean()

# Plotting
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(djf_daily_cycle, label="DJF", color="limegreen")
ax.plot(jja_daily_cycle, label="JJA", color="tomato")
ax.set_title("Average Daily Cycle of Hourly Precipitation")
ax.set_ylabel("Average Hourly Precipitation (mm/h)")
ax.set_xlabel("Hour of the Day")
ax.legend()
ax.grid()
plt.show()
No description has been provided for this image

04 - A few other variables¶

04-01:

In the first part of the code here, we show that the average temperature of all three soil temperatures are nearly the same and print it out.

Next we plot the three soil temperatures in an hourly resolution for the year 2020.

At the end of the code we plot the hree soil temperatures in an hourly resolution for may 2020

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
df = pd.read_csv('INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True)
df = df.drop('station', axis=1)

mean_TB1 = df['TB1'].mean()
mean_TB2 = df['TB2'].mean()
mean_TB3 = df['TB3'].mean()

# compare the average values
if np.isclose(mean_TB1,mean_TB2,mean_TB3):
    print('The three soil temperatures have approximately the same average value over the entire period')
    
else:
    print('The three soil temperatures have not the same average value over the entire period')

# filter for the full hour data 
df_hourly =  df.groupby(pd.Grouper(freq='H')).mean()

# filter for the data in 2020
df_hourly_2020 = df_hourly.loc[df_hourly.index.year==2020]

# plot of the soil temperature in hourly resolution for the year 2020
fig, ax = plt.subplots()
df_hourly_2020['TB1'].plot(label='TB1')
df_hourly_2020['TB2'].plot(label='TB2')
df_hourly_2020['TB3'].plot(label='TB3')
ax.set_title('Soil temperature 2020')
ax.set_xlabel('Time')
ax.set_ylabel('Temperature (°C)')
ax.legend()

# filter for the data for may 2020
df_may_2020 = df_hourly_2020.loc[df_hourly_2020.index.month==5]

# plot of the soil temperature in hourly resolution for may 2020
fig, ax = plt.subplots()
df_may_2020['TB1'].plot(label='TB1')
df_may_2020['TB2'].plot(label='TB2')
df_may_2020['TB3'].plot(label='TB3')
ax.set_title('Soil temperature may 2020')
ax.set_xlabel('Time')
ax.set_ylabel('Temperature (°C)')
ax.legend(loc='lower right')

plt.show()
The three soil temperatures have approximately the same average value over the entire period
No description has been provided for this image
No description has been provided for this image

04-02:

Here we plot the daily average temperature of the three soil temperatures (TB1, TB2 and TB3) over our whole dataframe

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# filter for the daily average
df_daily =  df.groupby(pd.Grouper(freq='D')).mean()

# plot the daily average 
fig, ax = plt.subplots()
df_daily['TB1'].plot(label='TB1')
df_daily['TB2'].plot(label='TB2')
df_daily['TB3'].plot(label='TB3')
ax.set_title('Average daily cycle of soil temperature')
ax.set_xlabel('Time')
ax.set_ylabel('Temperature (°C)')
ax.legend()

plt.show()
No description has been provided for this image

04-03:

Here we compute the difference between the air temperature and the dewpoint temperature and show it on a scatterplot

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Compute the difference between TL and TP 
df['Temperature_Difference'] = df['TL'] - df['TP']

# Plotting the temperature difference on a scatter plot
fig, ax = plt.subplots()
ax.scatter(x=df['RF'], y=df['Temperature_Difference'], alpha=0.2, color='cadetblue')
ax.set_xlabel('Relative Humidity')
ax.set_ylabel('Temperature Difference (°C)')
ax.set_title('Temperature Difference and Relative Humidity')

plt.show()
No description has been provided for this image

05 - Free coding project¶

Compareing temperature datas from all state capitals of austria

For our project we decided to compare the temperature of all state capitals of austria since 1900 (or the earliest date since 1900 where data exists) until 2022. We got the data from the ZAMG datahub in hourly resolution and convertet it into yearly resolution.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
df = pd.read_csv('TAG Datensatz_19000101_20230601.csv', index_col=1, parse_dates=True)
df = df['1900':'2022']

# split the dataframe into state capital dataframes

df_ibk = df[df['station'] == 39]
df_sbg = df[df['station'] == 6300]
df_brgz = df[df['station'] == 15]
df_linz = df[df['station'] == 56]
df_st_poelt = df[df['station'] == 93]
df_wien = df[df['station'] == 105]
df_eisstdt = df[df['station'] == 22]
df_graz = df[df['station'] == 30]
df_klgft = df[df['station'] == 48]

# define labels
labels = ['Inssbruck', 'Salzburg', 'Bregenz', 'Linz', 'St.Pölten', 'Wien', 'Eisenstadt', 'Graz', 'Klagenfurt']

# filter the yearly data (because daily data makes no sence in this context)
dfs = [df_ibk, df_sbg, df_brgz, df_linz, df_st_poelt, df_wien, df_eisstdt, df_graz, df_klgft]
dfs_year = []
for i in dfs:
    elem = i.groupby(pd.Grouper(freq='Y')).agg({'t':'mean', 'nied':'sum'}) # make the average temperature of each year
    dfs_year.append(elem[(elem['t']>6) & (elem['t']<13)]) # and filter the invalid datas

    
    
# plot
fig, ax = plt.subplots(1)
for df_, lbls in zip(dfs_year, labels):
    ax.plot(df_.index,df_['t'],label = lbls)

    
ax.set_xlabel('Time\n from 1900 to 2022')
ax.set_ylabel('Temperature(°C)')
ax.legend()    
plt.grid()
plt.show()
No description has been provided for this image

Here we can se nothing in this plot so we decided to separate austria in three areas into northern side of the alpine ridge, southern side of the alpine ridge and eastern austria

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# split up austria
north = dfs_year[0:3]
east = dfs_year[3:7]
south = dfs_year[7:]

# into regions
regions = [north, east, south]

fig, axs = plt.subplots(3,1,figsize = (15,10))

labels_regions = [labels[0:3], labels[3:7], labels[7:]]

# create a list of titles for subplots
titles = ['Northern alpine ridge', 'Eastern austria', 'Southern alpine ridge']

# loop over variables and plot 
for ax, region, lbsrgn, tlt in zip(axs, regions,labels_regions, titles):
    ax.grid()
    ax.set_title(tlt)
    for city, lbscty in zip(region, lbsrgn):
        ax.plot(city.index, city['t'], label = lbscty)
        ax.legend()
        ax.set_ylabel('Temperature(°C)')

ax.set_xlabel('Time\n from 1900 to 2022')        
plt.show()
No description has been provided for this image

We think that the trend for each state capital whould also be interesting and also easier to compare.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# create lists for the linear regression function to stick values in 
slopes = []
intercepts = []

# calculate the linear regression for every state capital
for df_, lbl in zip(dfs_year, labels):
    x = np.arange(0, len(df_.index.year))
    y = df_['t']
    slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(x, y)
    slopes.append(slope)
    intercepts.append(intercept)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
fig, axs = plt.subplots(3,3, figsize = (25,20))


for ax,lbl, slope, intercept  in zip(axs.ravel(),labels, slopes, intercepts):
    
    
    df_city = dfs_year[labels.index(lbl)]

    x_city = np.arange(0, len(df_city.index.year))
    y_city = df_city['t']

    # Generate x values for the regression line
    x_regress = np.linspace(min(x_city), max(x_city), 100)
    
    # Calculate corresponding y values using the regression line equation
    y_regress = slope * x_regress + intercept
    
    x_label = df_city.index.year[::10]
    
    x_tick_positions = np.interp(x_label, df_city.index.year, x_city)

    ax.set_xscale('linear')  
    ax.set_xticks(x_tick_positions)
    ax.set_xticklabels([str(int(label)) for label in x_label], fontsize = 9, rotation = 30)  # Convert tick values to integer labels and rotate
    
    # Plot the data points and linear regression
    ax.scatter(x_city, y_city, label='Yearly averaged temperature',alpha = 0.3)
    ax.plot(x_regress, y_regress, color='red', label='Temperature trend')
    ax.text(0.2, 0.9, 'y = ' + str(round(slope, 3)) + ' * x + ' + str(round(intercept, 1)),fontsize=12, horizontalalignment='center',
    verticalalignment='center', transform=ax.transAxes)
    ax.set_title(lbl)
    ax.grid()
    
    ax.set_ylabel('Teperature in °C')
    ax.legend(loc = 'lower right')
    
No description has been provided for this image

In this plots we can easily compare the diffrent trends off all state capitals. In the formula in the upper left we can read the yearly temperature rise from the gradient (first value) of the function. So we can see that the temperature in Klagenfurt has rised the most. But this comparison is not quite accurate, because some data start from 1900 and some start later. So let's look at all the data starting from 1950 and compare.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
df = df ['1950':'2022']


df_ibk = df[df['station'] == 39]
df_sbg = df[df['station'] == 6300]
df_brgz = df[df['station'] == 15]
df_linz = df[df['station'] == 56]
df_st_poelt = df[df['station'] == 93]
df_wien = df[df['station'] == 105]
df_eisstdt = df[df['station'] == 22]
df_graz = df[df['station'] == 30]
df_klgft = df[df['station'] == 48]

labels = ['Inssbruck', 'Salzburg', 'Bregenz', 'Linz', 'St.Pölten', 'Wien', 'Eisenstadt', 'Graz', 'Klagenfurt']

# filter the yearly data (because daily data makes no sence in this context)
dfs = [df_ibk, df_sbg, df_brgz, df_linz, df_st_poelt, df_wien, df_eisstdt, df_graz, df_klgft]
dfs_year = []
for i in dfs:
    elem = i.groupby(pd.Grouper(freq='Y')).agg({'t':'mean', 'nied':'sum'})
    dfs_year.append(elem[(elem['t']>6) & (elem['t']<13)]) # and filter the invalid datas

    
    
slopes = []
intercepts = []
for df_, lbl in zip(dfs_year, labels):
    x = np.arange(0, len(df_.index.year))
    y = df_['t']
    slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(x, y)
    slopes.append(slope)
    intercepts.append(intercept)
    
fig, axs = plt.subplots(3,3, figsize = (25,20))


for ax,lbl, slope, intercept  in zip(axs.ravel(),labels, slopes, intercepts):
    
    
    df_city = dfs_year[labels.index(lbl)]

    x_city = np.arange(0, len(df_city.index.year))
    y_city = df_city['t']

    # Generate x values for the regression line
    x_regress = np.linspace(min(x_city), max(x_city), 100)
    
    # Calculate corresponding y values using the regression line equation
    y_regress = slope * x_regress + intercept
    
    x_label = df_city.index.year[::10]
    
    x_tick_positions = np.interp(x_label, df_city.index.year, x_city)

    ax.set_xscale('linear')  
    ax.set_xticks(x_tick_positions)
    ax.set_xticklabels([str(int(label)) for label in x_label], fontsize = 9, rotation = 30)  # Convert tick values to integer labels and rotate
    
    # Plot the data points and linear regression
    ax.scatter(x_city, y_city, label='Yearly averaged temperature',alpha = 0.3)
    ax.plot(x_regress, y_regress, color='red', label='Temperature trend')
    ax.text(0.2, 0.9, 'y = ' + str(round(slope, 3)) + ' * x + ' + str(round(intercept, 1)),fontsize=12, horizontalalignment='center',
    verticalalignment='center', transform=ax.transAxes)
    ax.set_title(lbl)
    ax.grid()
    
    ax.set_ylabel('Teperature in °C')
    ax.legend(loc = 'lower right')
    
No description has been provided for this image

Here we look at all the data starting from 1950 and see that the temperature has rised the most in Innsbruck not in Klagenfurt. (in the timespan of 1950-2022)