Programming project¶

Preamble¶

Names: Marian Ebenbichler, Kilian Trummer
Matrikelnummer: 12221149, 12117412

01 - An energy balance model with hysteresis¶

Based on the code we wrote in week 07, code the following extension. For this exercise it's OK to copy-paste my solutions and start from there. Only copy the necessary (no useless code)!

The planetary albedo $\alpha$ is in fact changing with climate change. As the temperature drops, sea-ice and ice sheets are extending (increasing the albedo). Inversely, the albedo decreases as temperature rises. The planetary albedo of our simple energy balance model follows the following equation:

$$ \alpha = \begin{cases} 0.3,& \text{if } T \gt 280\\ 0.7,& \text{if } T \lt 250\\ a T + b, & \text{otherwise} \end{cases} $$

01-01: Compute the parameters $a$ and $b$ so that the equation is continuous at T=250K and T=280K.

import numpy as np

a = np.array([[250, 1], [280,1]])
a

array([[250,   1],
       [280,   1]])

a_inv = np.linalg.inv(a)

# Check that the inverse worked
np.testing.assert_allclose(a @ a_inv, np.identity(2), atol=1e-7)
np.round(a @ a_inv, decimals=7)

array([[1., 0.],
       [0., 1.]])

solution = a_inv @ [0.7, 0.3]
a,b = solution
print(a,b)

-0.013333333333333338 4.033333333333335

01-02: Now write a function called alpha_from_temperature which accepts a single positional parameter T as input (a scalar) and returns the corresponding albedo. Test your function using doctests to make sure that it complies to the instructions above.

def alpha_from_temperature(T):
    """
    Compute the planetary albedo alpha based on temperature T.

    Parameters
    ----------
        T : float
            Temperature value in Kelvin.

    Returns
    -------
        a * T + b : Planetary albedo alpha(float)
        

    Examples:
        >>> alpha_from_temperature(290)
        0.3
        >>> alpha_from_temperature(240)
        0.7

    Planetary albedo equation:
        alpha =
        | 0.3,                     if T > 280
        | 0.7,                     if T < 250
        | a * T + b,               otherwise

    Compute the parameters a and b so that the equation is continuous at T=250K and T=280K.
        >>> a = (0.7 - 0.3) / (250 - 280)
        >>> b = 0.7 - a * 250
        >>> alpha_from_temperature(249)
        0.7
        >>> alpha_from_temperature(281)
        0.3
        >>> abs(alpha_from_temperature(265) - 0.5) <= 0.00001
        True
        >>> abs(alpha_from_temperature(275) - 0.366666) <= 0.00001
        True
    """
    if T > 280:
        return 0.3
    elif T < 250:
        return 0.7
    else:
        #a = (0.7 - 0.3) / (250 - 280)
        #b = 0.7 - a * 250
        return a * T + b

01-03: Adapt the existing code from week 07 to write a function called temperature_change_with_hysteresis which accepts t0 (the starting temperature in K), n_years (the number of simulation years) as positional arguments and tau (the atmosphere transmissivity) as keyword argument (default value 0.611). Verify that:

the stabilization temperature with t0 = 292 and default tau is approximately 288K
the stabilization temperature with t0 = 265 and default tau is approximately 233K

def asr(alpha=0.3):
    """Absorbed Solar Radiation (W m-2).

    Parameters
    ----------
    alpha : float, optional
        the planetary albedo

    Returns
    -------
    the asr (float)

    Examples
    --------
    >>> print(f'{asr():.2f}')
    238.35
    >>> print(f'{asr(alpha=0.32):.2f}')
    231.54
    """
    S = 1362
    
    return (1 - alpha) * S / 4

def olr(t, tau=0.611):
    """Outgoing Longwave Radiation (W m-2).

    Parameters
    ----------
    t : float
        the atmosphere temperature (K)
    tau : float, optional
        the atmosphere transmissivity (-)

    Returns
    -------
    the olr (float)

    Examples
    --------
    >>> print(f'{olr(288):.2f}')
    238.34
    >>> print(f'{olr(288, tau=0.63):.2f}')
    245.75
    """
    sbc=5.67e-8
    olr=tau*sbc*t**4
    return olr

def temperature_change_with_hysteresis(t0, n_years, tau=0.611):
    """
    Simulate temperature change with hysteresis effect over a given number of years.
    
    Parameters
    ----------
        t0 : float
            Starting temperature in Kelvin.
        n_years : int
            Number of simulation years.
        tau : float, optional
            Atmosphere transmissivity. Default is 0.611.

    Returns:
    ----------
        float: Stabilization temperature after the given number of years.

    Examples:
        >>> round(temperature_change_with_hysteresis(292, 100)[1],1)
        288.0
        >>> round(temperature_change_with_hysteresis(265, 100)[1],1)
        233.0
    """
    dt = 60 * 60 * 24 * 365
    C = 4.0e+08
    time = np.arange(n_years + 1)
    T = np.zeros(n_years + 1)
    T[0] = t0
    C = 4.0e+08

    #computing the new albedo and new temperature for every year
    for i in range(n_years):
        alpha = alpha_from_temperature(T[i])
        T[i+1] = T[i] + (dt / C) * (asr(alpha=alpha) - olr(T[i], tau=tau))
    return time, T

01-04: Realize a total of N simulations with starting temperatures regularly spaced between t0=206K, and t0=318K and plot them on a single plot for n_years=50. The plot should look somewhat similar to this example for N=21.

Bonus: only if you want (and if time permits), you can try to increase N and add colors to your plot to create a graph similar to this one.

import matplotlib.pyplot as plt

N = 21  # Number of simulations
n_years = 50  # Number of simulation years

# Generate regularly spaced starting temperatures
t0_values = np.linspace(206, 318, N)



# Generate time array
t = np.linspace(0, n_years, n_years+1)

for t0 in t0_values:
    plt.plot(t, temperature_change_with_hysteresis(t0, n_years, tau=0.6)[1],color='black', label=str(t0) + 'K')

plt.xlabel('Time (years)')
plt.ylabel('Temperature (K)')
plt.title('Temperature Evolution with Hysteresis')

plt.grid(True)
plt.show()

No description has been provided for this image

import matplotlib.pyplot as plt
import numpy as np

N = 100  # number of simulations
n_years = 50 # number of simulated years

# Generate regularly spaced starting temperatures
t0_values = np.linspace(206, 318, N)

# Generate time array
t = np.linspace(0, n_years, n_years+1)

# Create colorfade from blue to red
color_map = plt.cm.get_cmap('Spectral_r')
colors = color_map(np.linspace(0, 1, N))

# plot the simulation with color fade
for i, t0 in enumerate(t0_values):
    temperature = temperature_change_with_hysteresis(t0, n_years, tau=0.6)
    plt.plot(t, temperature[1], color=colors[i], label=str(t0) + 'K')

plt.xlabel('Time (years)')
plt.ylabel('Temperature (K)')
plt.title('Temperature Evolution with Hysteresis')
plt.grid(True)
plt.show()

C:\Users\PC\AppData\Local\Temp\ipykernel_14568\3534863455.py:14: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead.
  color_map = plt.cm.get_cmap('Spectral_r')

02 - Weather station data files¶

I downloaded 10 min data from the recently launched ZAMG data hub. The data file contains selected parameters from the "INNSBRUCK-FLUGPLATZ (ID: 11804)" weather station.

You can download the data from the following links (right-click + "Save as..."):

station data
parameter metadata
station list from the ZAMG (in a better format than last time)

Let me open the data for you and display its content:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True)
df = df.drop('station', axis=1)

02-01: after reading the documentation of the respective functions (and maybe try a few things yourself), explain in plain sentences:

what am I asking pandas to do with the index_col=1, parse_dates=True keyword arguments? Why am I doing this?
what am I asking pandas to do with .drop()? Why axis=1?

What am I asking pandas to do with the index_col=1, parse_dates=True keyword arguments? Why am I doing this?

index_col=1

This line of code sets the second column of the csv file as the index column. which is useful in this case because the first column with index 0 is just the stationnumber and is the same for every row. However the second column with index 1 is the time column and it's very useful to use this for the index of the dataframe.

parse_dates=True

This line of code tells pandas that it should attempt to parse the dates in the CSV file and represent them as datetime objects in the resulting Dataframe.

.drop()

is the command to drop/delete a specific column or row, axis=1 indicates that a column should be removed. axis=0 tells pandas to remove a row.

Now let me do something else from you:

dfmeta = pd.read_csv('ZEHNMIN Parameter-Metadaten.csv', index_col=0)
dfmeta.loc[df.columns]

	Kurzbeschreibung	Beschreibung	Einheit
DD	Windrichtung	Windrichtung, vektorielles Mittel über 10 Minuten	°
FF	vektorielle Windgeschwindigkeit	Windgeschwindigkeit, vektorielles Mittel über ...	m/s
GSX	Globalstrahlung	Globalstrahlung, arithmetisches Mittel über 10...	W/m²
P	Luftdruck	Luftdruck, Basiswert zur Minute10	hPa
RF	Relative Feuchte	Relative Luftfeuchte, Basiswert zur Minute10	%
RR	Niederschlag	10 Minuten Summe des Niederschlags, Summe der ...	mm
SO	Sonnenscheindauer	Sonnenscheindauer, Sekundensumme über 10 Minuten	s
TB1	Erdbodentemperatur in 10cm Tiefe	Erdbodentemperatur in 10cm Tiefe, Basiswert zu...	°C
TB2	Erdbodentemperatur in 20cm Tiefe	Erdbodentemperatur in 20cm Tiefe, Basiswert zu...	°C
TB3	Erdbodentemperatur in 50cm Tiefe	Erdbodentemperatur in 50cm Tiefe, Basiswert zu...	°C
TL	Lufttemperatur in 2m	Lufttemperatur in 2m Höhe, Basiswert zur Minute10	°C
TP	Taupunkt	Taupunktstemperatur, Basiswert zur Minute10	°C

02-02: again, explain in plain sentences what the dfmeta.loc[df.columns] is doing, and why it works that way.

The code dfmeta.loc[df.columns] retrieves rows from the DataFrame dfmeta based on the index values provided by the columns of the DataFrame df.

Finally, let me do a last step for you before you start coding:

dfh = df.resample('H').mean()

02-03: explore the dfh dataframe. Explain, in plain words, what the purpose of .resample('H') followed by mean() is. Explain what .resample('H').max() and .resample('H').sum() would do.

.resample('H') "Resampling" means that the frequency of the data is changed, in this case, from a 10-minute frequency to an hourly frequency. By using .resample('H'), the data gets grouped into hourly time intervals.
.resample('H').mean() The mean function then calculates the mean/average value of each hourly interval seperately.
.resample('H).max() The max function calculates the maximum value of each hour seperatley. Meaning that it finds the greatest value recorded within every hourly interval.
.resample('H'.sum() Similarly the sum function calculates the sum within each hour, giving the total value recorded in every hour. Meaning that it adds all values of each hourly interval.

02-04: Using np.allclose, make sure that the average of the first hour (that you'll compute yourself from df) is indeed equal to the first row of dfh. Now, two variables in the dataframe have units that aren't suitable for averaging. Please convert the following variables to the correct units:

RR needs to be converted from the average of 10 min sums to mm/h
SO needs to be converted from the average of 10 min sums to s/h

dfhsum = df.resample('H').sum()

dfh['RR'] = dfhsum['RR']
dfh['SO'] = dfhsum['SO']

np.allclose(df.iloc[:6].mean(), dfh.head(1))

True

From now on, we will use the hourly data only (and further aggregations when necessary). The 10 mins data are great but require a little bit more of pandas kung fu (the chinese term, not the sport) to be used efficiently.

Spend some time exploring the dfh dataframe we just created. What time period does it cover? What variables does it contain?

time_range_min = dfh.index.min()
time_range_max = dfh.index.max()
print(time_range_min, time_range_max)

column_names = ', '.join(dfh.columns)
print("Variables: ", column_names)

2015-01-01 00:00:00 2021-12-31 23:00:00
Variables:  DD, FF, GSX, P, RF, RR, SO, TB1, TB2, TB3, TL, TP

Note on pandas: all the exercises below can be done with or without pandas. Each question can be answered with very few lines of code (often one or two) with pandas, and I recommend to use it as much as possible. If you want, you can always use numpy in case of doubt: you can access the data as a numpy array with: df[column_name].values.

03 - Precipitation¶

In this section, we will focus on precipitation only.

03-01: Compute the average annual precipitation (m/year) over the 7-year period.

annual_precipitation = dfh['RR'].resample('A').sum()

# Compute the average annual precipitation
average_precipitation = annual_precipitation.mean()/100

print("Average annual precipitation: {:.2f} m/year".format(average_precipitation))

Average annual precipitation: 9.21 m/year

03-02: What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?

# compute the smallest non-zero precipitation
smallest_nonzero_precipitation = dfh['RR'].loc[dfh['RR'] > 0].min()

print("Smallest non-zero precipitation: {:.2f} mm".format(smallest_nonzero_precipitation))

# resample the data to hourly intervals and compute the maximum hourly precipitation
hourly_precipitation = dfh['RR'].resample('H').sum()
max_hourly_precipitation = hourly_precipitation.max()

print("Maximum hourly precipitation: {:.2f} mm".format(max_hourly_precipitation))

# compute when the maximum horly precipitation did occur
max_hourly_precipitation_time = hourly_precipitation.idxmax()

print("Time of maximum hourly precipitation: {}".format(max_hourly_precipitation_time))

Smallest non-zero precipitation: 0.10 mm
Maximum hourly precipitation: 22.20 mm
Time of maximum hourly precipitation: 2021-09-16 22:00:00

03-03: Plot a histogram of hourly precipitation, with bins of size 0.2 mm/h, starting at 0.1 mm/h and ending at 25 mm/h. Plot the same data, but this time with a logarithmic y-axis. Compute the 99th percentile (or quantile) of hourly precipitation.

bins = np.arange(0.1, 25.1, 0.2)  # Start at 0.1, end at 25 (inclusive), with bin size of 0.2
plt.hist(hourly_precipitation, bins=bins)
plt.xlabel('hourly precipitation (mm/h)')
plt.ylabel('frequency in hours')
plt.title('histogram of hourly precipitation')
plt.show()

# Plotting histogram with logarithmic y-axis
plt.hist(hourly_precipitation, bins=125)
plt.xlabel('hourly precipitation (mm/h)')
plt.ylabel('frequency (logarithmic scale)')
plt.yscale('log')
plt.title('histogram of hourly precipitation with logarithmic y-axis')
plt.show()

# compute the 99th percentile of hourly precipitation
percentile_99th = np.percentile(dfh['RR'], 99)
print("the 99th percentile of hourly precipitation is:", round(percentile_99th,3), "mm/h")

the 99th percentile of hourly precipitation is: 2.333 mm/h

03-04: Compute daily sums (mm/d) of precipitation (tip: use .resample again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).

daily_precipitation = dfh['RR'].resample('D').sum()

# first compute the total number of rain days per year
total_raindays = (daily_precipitation >= 0.1).groupby(daily_precipitation.index.year).sum()

print("the average number of rain days per year is aprox.:", round(total_raindays.mean()), "days")

the average number of rain days per year is aprox.: 170 days

03-05: Now select (subset) the daily dataframe to keep only only daily data in the months of December, January, February (DFJ). To do this, note that dfh.index.month exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).

# DJF
DJF_daily = daily_precipitation[(daily_precipitation.index.month >= 12) | (daily_precipitation.index.month <=2)]
print("the average precipitation in DJF:", DJF_daily.mean(), "mm/d")
DJF_raindays = (DJF_daily >= 0.1).sum()/7
print("the average number of rain days in DJF:", DJF_raindays, "days")

#JJA
JJA_daily = daily_precipitation[(daily_precipitation.index.month >= 6) & (daily_precipitation.index.month <=8)]
print("the average precipitation in JJA:", JJA_daily.mean(), "mm/d")
JJA_raindays = (JJA_daily >= 0.1).sum()/7
print("the average number of rain days in JJA:", JJA_raindays, "days")

the average precipitation in DJF: 1.7726265822784812 mm/d
the average number of rain days in DJF: 37.57142857142857 days
the average precipitation in JJA: 4.024844720496894 mm/d
the average number of rain days in JJA: 53.142857142857146 days

03-06: Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.

#DJF
DJF_hourly = hourly_precipitation[(hourly_precipitation.index.month >= 12) | (hourly_precipitation.index.month <=2)]
DJF_over99 = (DJF_hourly > percentile_99th).sum()
print("total number of times that hourly precipitation in DJF is above the 99th percentile:", DJF_over99)
#JJA
JJA_hourly = hourly_precipitation[(hourly_precipitation.index.month >= 6) & (hourly_precipitation.index.month <=8)]
JJA_over99 = (JJA_hourly > percentile_99th).sum()
print("total number of times that hourly precipitation in JJA is above the 99th percentile:", JJA_over99)

total number of times that hourly precipitation in DJF is above the 99th percentile: 68
total number of times that hourly precipitation in JJA is above the 99th percentile: 308

03-07: Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA. I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour exists and can be used to categorize data. Then, note that df.groupby exists and can be used exactly for that (documentation).

# DJF average cycle of hourly precipitation
avg_cycle_DJF = DJF_hourly.groupby(DJF_hourly.index.hour).mean()
# JJA average cycle of hourly precipitation
avg_cycle_JJA = JJA_hourly.groupby(JJA_hourly.index.hour).mean()
# plotting
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(avg_cycle_DJF.index, avg_cycle_DJF, label='DJF')
ax.plot(avg_cycle_JJA.index, avg_cycle_JJA, label='JJA')
plt.xlabel('Hour of day (UTC)')
plt.ylabel('Hourly Precipitation (mm/h)')
plt.title('Precipitation daily cycle Innsbruck 2015-2021')
plt.legend()
plt.show()

04 - A few other variables¶

In this section, we will continue to analyze the weather station data.

04-01: Verify that the three soil temperatures have approximately the same average value over the entire period. Now plot the three soil temperature timeseries in hourly resolution over the course of the year of 2020 (example). Repeat the plot with the month of may 2020.

TB1_mean = dfh['TB1'].mean()
TB2_mean = dfh['TB2'].mean()
TB3_mean = dfh['TB3'].mean()
print(TB1_mean, TB2_mean, TB3_mean)

11.393769492976766 11.514406669059351 11.379132772780602

# Filter data for the year 2020
df_2020 = dfh.loc['2020']
# Plot the three soil temperature time series for the year 2020
plt.figure(figsize=(12, 6))
plt.plot(df_2020.index, df_2020['TB1'], label='TB1')
plt.plot(df_2020.index, df_2020['TB2'], label='TB2')
plt.plot(df_2020.index, df_2020['TB3'], label='TB3')
plt.xlabel('Time')
plt.ylabel('Soil Temperature (°C)')
plt.title('Soil Temperature Time Series - Year 2020')
plt.legend()
plt.grid(True)
plt.show()

df_may_2020 = df.loc['2020-05']

# Plot the three soil temperature time series for the month of May 2020
plt.figure(figsize=(12, 6))
plt.plot(df_may_2020.index, df_may_2020['TB1'], label='TB1')
plt.plot(df_may_2020.index, df_may_2020['TB2'], label='TB2')
plt.plot(df_may_2020.index, df_may_2020['TB3'], label='TB3')
plt.xlabel('Time')
plt.ylabel('Soil Temperature (°C)')
plt.title('Soil Temperature Time Series - May 2020')
plt.legend()
plt.grid(True)
plt.show()

04-02: Plot the average daily cycle of all three soil temperatures.

df_resampled = df.resample('H').mean()

df_resampled['hour'] = df_resampled.index.hour

# Calculate the average soil temperature for each hour of the day
average_temperatures = df_resampled.groupby('hour')[['TB1', 'TB2', 'TB3']].mean()

# Plot the average daily cycle of all three soil temperatures
plt.figure(figsize=(12, 6))
plt.plot(average_temperatures.index, average_temperatures['TB1'], label='TB1')
plt.plot(average_temperatures.index, average_temperatures['TB2'], label='TB2')
plt.plot(average_temperatures.index, average_temperatures['TB3'], label='TB3')
plt.xlabel('Hour of the Day')
plt.ylabel('Average Soil Temperature (°C)')
plt.title('Average Daily Cycle of Soil Temperatures')
plt.legend()
plt.grid(True)
plt.show()

04-03: Compute the difference (in °C) between the air temperature and the dewpoint temperature. Now plot this difference on a scatter plot (x-axis: relative humidity, y-axis: temperature difference).

# Compute the temperature difference between air temperature and dewpoint temperature
dfh['TempDiff'] = dfh['TL'] - dfh['TP']

# Plot the scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(dfh['RF'], dfh['TempDiff'], s=10)
plt.xlabel('Relative Humidity')
plt.ylabel('Temperature Difference (°C)')
plt.title('Temperature Difference vs. Relative Humidity')
plt.grid(True)
plt.show()

05 - Free coding project¶

The last part of this semester project is up to you! You are free to explore whatever interests you. I however add three requirements:

This section should have at least 5 original plots in it. They are the output of your analysis.
This section should also use additional data that you downloaded yourself. The easiest way would probably be to download another station(s) from the ZAMG database, or data from the same station but for another time period (e.g. for trend or change analysis). You can, however, decide to do something completely different if you prefer (as long as you download and read one more file).
this section should contain at least one regression or correlation analysis between two parameters. Examples:
- between two different variables at the same station (like we did with the dewpoint above)
- between different stations (for example, average temperature as a function of station elevation)
- between average temperature and time (trends analysis)
- etc.

That's it! Here are a few ideas:

detection of trends and changes at the station Innsbruck for 1993-2021
comparison of 5-yr climatologies at various stations in Tirol, taking elevation or location into account
compute the theoretical day length from the station's longitude and latitude (you can find solutions for this online, just let me know the source if you used a solution online), and use these computations to compare the measured sunshine duration to the maximum day length. This can be used to classify "sunshine days" for example.
use the python "windrose" library to plot a windrose at different locations and time of day.
etc.

If you have your own idea but are unsure about whether this is too much or not enough, come to see me in class! In general, the three requirements above should be enough.

My goal with this section is to let you formulate a programming goal and implement it.

For the fifth part of our coding project, we chose to compare the data from Innsbruck with data from a station at a much higher elevation. Our aim is to see if we can observe climate change at each station and determine if there is a difference in warming between a city and the mountains. To achieve this, we will conduct a trend analysis and also examine any changes in precipitation over time.

As our second station, we have selected Hoher Sonnblick, which is one of Austria's highest and oldest observatories.

#generating Dataframe for Sonnblick
df_SB = pd.read_csv('TAG Datensatz_SB.csv',  skiprows=[1], index_col=0, parse_dates=True)
df_SB.index = pd.to_datetime(df_SB.index).date
df_SB = df_SB.iloc[:, 1:-1]
df_SB = df_SB.drop(df_SB.index[1])
df_SB.index = pd.to_datetime(df_SB.index)

df_SB.head()

	nied	rel	t	tmax	tmin	schnee	vvmax
1900-01-02	0.4	95.0	-5.4	-4.0	-6.7	NaN	NaN
1900-01-04	10.4	96.0	-5.0	-4.2	-5.8	NaN	NaN
1900-01-05	1.4	94.0	-6.8	-4.5	-9.0	NaN	NaN
1900-01-06	12.1	89.0	-8.8	-7.5	-10.0	NaN	NaN
1900-01-07	5.8	90.0	-7.8	-6.8	-8.7	NaN	NaN

#generating Dataframe for Innsbruck
df_Ibk = pd.read_csv('TAG Datensatz_IBK.csv',  skiprows=[1], index_col=0, parse_dates=True)
df_Ibk.index = pd.to_datetime(df_Ibk.index).date
df_Ibk = df_Ibk.iloc[:, 1:-1]
df_Ibk = df_Ibk.drop(df_Ibk.index[1])
df_Ibk.index = pd.to_datetime(df_Ibk.index)

df_Ibk.head()

	nied	rel	t	tmax	tmin	schnee	vvmax
1900-01-02	-1.0	84.0	7.4	11.5	3.3	NaN	NaN
1900-01-04	7.7	91.0	8.1	12.7	3.4	NaN	NaN
1900-01-05	-1.0	91.0	3.8	5.0	2.5	NaN	NaN
1900-01-06	-1.0	89.0	2.1	3.3	0.8	NaN	NaN
1900-01-07	2.8	89.0	1.7	3.6	-0.2	NaN	NaN

df_Ibk.min()

nied      -1.0
rel       24.0
t        -21.1
tmax     -18.2
tmin     -26.9
schnee    -1.0
vvmax      0.5
dtype: float64

#removing the first 20 years of the dataframes since there are many incorrect values and outliers which could effect the data analysis later on
#and 100 years should me more than enough to visualize changes in climate.

# Extract month and year for both dataframes
df_Ibk['Month'] = df_Ibk.index.month
df_Ibk['Year'] = df_Ibk.index.year
df_SB['Month'] = df_SB.index.month
df_SB['Year'] = df_SB.index.year


# Filter out rows with years between 1900 and 1920
df_SB = df_SB[(df_SB['Year'] < 1900) | (df_SB['Year'] > 1920)]
df_Ibk = df_Ibk[(df_Ibk['Year'] < 1900) | (df_Ibk['Year'] > 1920)]

#Filter out rows with year greater than 2023
df_SB = df_SB[(df_SB['Year'] < 2023)]
df_Ibk = df_Ibk[(df_Ibk['Year'] < 2023)]

#set all -1 values at precipitation and snowhight to NAs

df_SB['schnee'] = df_SB['schnee'].replace(-1, np.nan)
df_SB['nied'] = df_SB['nied'].replace(-1, np.nan)

df_Ibk['nied'] = df_Ibk['nied'].replace(-1, np.nan)
df_Ibk['schnee'] = df_Ibk['schnee'].replace(-1, np.nan)

# Calculate average mean temperature for each month for both stations
mean_temp_Ibk = df_Ibk.groupby('Month')['t'].mean()
mean_temp_SB = df_SB.groupby('Month')['t'].mean()

# Plot the annual cycles for both stations
plt.figure(figsize=(10, 6))
plt.plot(mean_temp_Ibk.index, mean_temp_Ibk.values, label='Innsbruck')
plt.plot(mean_temp_SB.index, mean_temp_SB.values, label='Sonnblick')
plt.xlabel('Month')
plt.ylabel('Mean Temperature')
plt.title('Average Annual Cycle of Mean Temperature')
plt.legend()
plt.show()

import datetime 
from scipy import stats

# Calculate the average annual mean temperature for each year for the subset data
mean_temp_Ibk_yearly = df_Ibk.resample('Y')['t'].mean()
mean_temp_SB_yearly = df_SB.resample('Y')['t'].mean()

# Plot the trend of mean temperature over time for both stations using the subset data
plt.figure(figsize=(10, 6))
plt.plot(mean_temp_Ibk_yearly.index.year, mean_temp_Ibk_yearly.values, label='Innsbruck')
plt.plot(mean_temp_SB_yearly.index.year, mean_temp_SB_yearly.values, label='Sonnblick')
plt.xlabel('Year')
plt.ylabel('Mean Temperature')
plt.title('Trend of Mean Temperature Over Time')
plt.legend()

# Calculate trendlines using linregress for the subset data
slope_Ibk, intercept_Ibk, _, _, _ = stats.linregress(mean_temp_Ibk_yearly.index.year, mean_temp_Ibk_yearly.values)
trendline_Ibk = slope_Ibk * mean_temp_Ibk_yearly.index.year + intercept_Ibk

slope_SB, intercept_SB, _, _, _ = stats.linregress(mean_temp_SB_yearly.index.year, mean_temp_SB_yearly.values)
trendline_SB = slope_SB * mean_temp_SB_yearly.index.year + intercept_SB

# Plot trendlines
plt.plot(mean_temp_Ibk_yearly.index.year, trendline_Ibk, 'r--', label='Innsbruck Trend')
plt.plot(mean_temp_SB_yearly.index.year, trendline_SB, 'r--', label='Sonnblick Trend')

plt.legend()
plt.show()

print(slope_Ibk)
print(slope_SB)

0.02527845594224632
0.022048975098567843

Due to the shown graph above, which displays the trend of the mean temperature in Innsbruck and on Sonnblick, we can see that on both places the temperature is increasing. At first glance the two trendlines, marked in rend, seem to have the same slope steepness, which would indicate that climate changes in the same rate, regardless of the altitude. By printing the slopes of the trendlines, one can clearly see that, the slope of the increase of mean temperature in Innsbruck, is indeed a bit steeper. This could indicate that a city located at around 570 m a.s.l is getting warmer than a mountain on an altitude of 3100m.

However this is a linear regression model, which only represents climate change in a simplified manner.

# Calculate the average annual snow height for each year
snow_height_Ibk_yearly = df_Ibk.resample('Y')['schnee'].mean()
snow_height_SB_yearly = df_SB.resample('Y')['schnee'].mean()

# Plot the snow height over time for both stations
plt.figure(figsize=(10, 6))
plt.plot(snow_height_Ibk_yearly.index.year, snow_height_Ibk_yearly.values, label='Innsbruck')
plt.plot(snow_height_SB_yearly.index.year, snow_height_SB_yearly.values, label='Sonnblick')
plt.xlabel('Year')
plt.ylabel('Snow Height in cm')
plt.title('Change in Snow Cover Over Time')
plt.legend()
plt.show()

In this graph we can see, that over the recorded timespans there is not really a recognicable trend of decreasing snow fall, neither in the city nor on the mountain. However in the next plot, which shows the maximum measured snowhight on Sonnblick, one can see that since the 1990s the maximum measurable hight of 500cm is not reached as often as in earlier years.

# Plot the maximum snow height over time for df_SB

max_snow_height_SB_yearly = df_SB.resample('Y')['schnee'].max()

plt.figure(figsize=(10, 6))
plt.plot(max_snow_height_SB_yearly.index.year, max_snow_height_SB_yearly.values, label='Sonnblick', color='lightblue')
plt.xlabel('Year')
plt.ylabel('Max Snow Height in cm')
plt.title('Maximum Snowheight on Sonnblick')

plt.legend()
plt.show()

# Filter the data for the desired time periods
df_Ibk_1970_2022 = df_Ibk.loc['1970-01-01':'2022-12-31']
df_Ibk_1920_1969 = df_Ibk.loc['1920-01-01':'1969-12-31']
df_SB_1970_2022 = df_SB.loc['1970-01-01':'2022-12-31']
df_SB_1920_1969 = df_SB.loc['1920-01-01':'1969-12-31']

# Calculate the average monthly temperature for each period
mean_temp_1970_2022 = df_Ibk_1970_2022.groupby(df_Ibk_1970_2022.index.month)['t'].mean()
mean_temp_1920_1969 = df_Ibk_1920_1969.groupby(df_Ibk_1920_1969.index.month)['t'].mean()
mean_SB_1970_2022 = df_SB_1970_2022.groupby(df_SB_1970_2022.index.month)['t'].mean()
mean_SB_1920_1969 = df_SB_1920_1969.groupby(df_SB_1920_1969.index.month)['t'].mean()

# Plot the annual cycle of temperature
months = range(1, 13)

plt.figure(figsize=(10, 6))
plt.plot(months, mean_temp_1970_2022, label='1970-2022 Innsbruck', color = 'b')
plt.plot(months, mean_temp_1920_1969, label='1920-1969', color = 'b', linestyle='--' )
plt.plot(months, mean_SB_1970_2022, label='1970-2022 Sonnblick', color = 'orange')
plt.plot(months, mean_SB_1920_1969, label='1920-1969', color = 'orange', linestyle='--' )

plt.xlabel('Month')
plt.ylabel('Mean Temperature in °C')
plt.title('Mean Temperature ')
plt.legend()
plt.xticks(months)
plt.show()

Since the 1970s, the Earth has been experiencing global warming. The graph shown compares the average temperatures between 1920-1969 and 1979-2022 at two stations: Innsbruck and Sonnblick. The biggest differences are observed in the months with extremely cold and warm temperatures (January, February, June, July, August, December).

# Filter the data for the desired time periods
df_Ibk_1970_2022 = df_Ibk.loc['1970-01-01':'2022-12-31']
df_Ibk_1920_1969 = df_Ibk.loc['1920-01-01':'1969-12-31']

# Calculate the average monthly temperature for each period
mean_temp_1970_2022 = df_Ibk_1970_2022.groupby(df_Ibk_1970_2022.index.month)['t'].mean()
mean_temp_1920_1969 = df_Ibk_1920_1969.groupby(df_Ibk_1920_1969.index.month)['t'].mean()

# Plot the annual cycle of temperature
months = range(1, 13)

plt.figure(figsize=(10, 6))
plt.plot(months, mean_temp_1920_1969, label='1920-1969 (Reference)', color='black', linestyle='--')

for group in range(1973, 2022, 10):
    start_year = group
    end_year = group + 10
    
    if end_year <= 2022:
        df_group = df_Ibk.loc[f'{start_year}-01-01':f'{end_year}-12-31']
        mean_temp_group = df_group.groupby(df_group.index.month)['t'].mean()
        plt.plot(months, mean_temp_group, label=f'{start_year}-{end_year}', alpha=0.5)

df_group = df_Ibk.loc['2013-01-01' : '2022-12-31']
mean_temp_group = df_group.groupby(df_group.index.month)['t'].mean()
plt.plot(months, mean_temp_group, label='2013- 2022', alpha=0.5)


plt.xlabel('Month')
plt.ylabel('Mean Temperature')
plt.title('Annual Cycle of Temperature')
plt.legend()
plt.xticks(months)
plt.show()

The graph clearly demonstrates that global warming has become significantly more pronounced in recent years, particularly since 2000.

Now we want to examine the changes in Precipitation over the 100 year time period and compare the changes between our two stations. To make the changes rather presentable and visible, we chose to group the data into 4 time periods (each over 25 years) and then compare the two stations side-by-side with a Bar Blot. This plot brings a direct comparison of the precipitation amounts at each station for the same time period.

# Divide the time into four sections (each 25 years)
time_sections = np.array_split(df_Ibk.index.year.unique(), 4)

# Calculate average precipitation for each section and station
avg_precip_Ibk = [df_Ibk.loc[(df_Ibk.index.year >= section.min()) & (df_Ibk.index.year <= section.max()), 'nied'].mean() for section in time_sections]
avg_precip_SB = [df_SB.loc[(df_SB.index.year >= section.min()) & (df_SB.index.year <= section.max()), 'nied'].mean() for section in time_sections]

# Calculate total precipitation per time section (assuming 365 days in a year)
total_precip_Ibk = [avg * 365 for avg in avg_precip_Ibk]
total_precip_SB = [avg * 365 for avg in avg_precip_SB]

# Create bar plot
bar_width = 0.35
index = np.arange(len(time_sections))

fig, ax = plt.subplots(figsize=(10, 6))
rects1 = ax.bar(index, total_precip_Ibk, bar_width, label='Innsbruck')
rects2 = ax.bar(index + bar_width, total_precip_SB, bar_width, label='Hoher Sonnblick')

# Add labels, title, and legend
ax.set_xlabel('Time Sections')
ax.set_ylabel('Average Precipitation (mm/year)')
ax.set_title('Average Precipitation Comparison between Innsbruck and Hoher Sonnblick')
ax.set_xticks(index + bar_width / 2)
ax.set_xticklabels(['1921-1946', '1947-1972', '1973-1997', '1998-2022'])

ax.legend()
plt.show()

In the graph above we can see, that the average precipitation of Innsbruck almost remained the same in the last 100 years for each time section. But in contrast to that for the station at Hoher Sonnblick the average precipitation was constantly increasing with every time period significantly.

To visualize and to validate this interpretation even more, we tried to visualize the average precipitation once more with a area plot.

# Calculate the average precipitation for each station
average_precip_Ibk = df_Ibk['nied'].resample('Y').mean()
average_precip_SB = df_SB['nied'].resample('Y').mean()

# Create the area plot
plt.figure(figsize=(10, 6))
plt.fill_between(average_precip_Ibk.index, average_precip_Ibk, label='Innsbruck', color='blue', alpha=0.5)
plt.fill_between(average_precip_SB.index, average_precip_SB, label='Sonnblick', color='green', alpha=0.5)

# Set labels and title
plt.xlabel('Time')
plt.ylabel('Average Annual Precipitation (mm)')
plt.title('Comparison of Average Annual Precipitation: Innsbruck vs Sonnblick')

plt.legend()
plt.show()

This plot shows once more that the average annual precipitation barely changed for Innsbruck but increased at Sonnblick over the last 100 years.

To get a more detailed information about the change in precipitation, we decided to differentiate the precipitation data into seasons. But we figured that it would maybe get a little confusing if we made a plot including all for seasons of the year. Therefore we splitted the year into two seasons: Summer(from May till October) and Winter (from November till April). To avoid confusion we also plotted the data seperately for our two stations.

# Create a copy of the DataFrame with the required time period
df_Ibk_seasons = df_Ibk['1922':'2022'].copy()
df_SB_seasons = df_SB['1922':'2022'].copy()

# Assign seasons based on the month
df_Ibk_seasons['Season'] = df_Ibk_seasons.index.month.map({11: 'Winter', 12: 'Winter', 1: 'Winter', 2: 'Winter', 3: 'Winter', 4: 'Winter', 5: 'Summer', 6: 'Summer', 7: 'Summer', 8: 'Summer', 9: 'Summer', 10: 'Summer'})
df_SB_seasons['Season'] = df_SB_seasons.index.month.map({11: 'Winter', 12: 'Winter', 1: 'Winter', 2: 'Winter', 3: 'Winter', 4: 'Winter', 5: 'Summer', 6: 'Summer', 7: 'Summer', 8: 'Summer', 9: 'Summer', 10: 'Summer'})

# Calculate the average precipitation for each season and year
average_precip_Ibk_seasons = df_Ibk_seasons.groupby(['Season', df_Ibk_seasons.index.year])['nied'].mean()
average_precip_SB_seasons = df_SB_seasons.groupby(['Season', df_SB_seasons.index.year])['nied'].mean()

# Reshape the data for plotting
data_Ibk = average_precip_Ibk_seasons.unstack(level=0)
data_SB = average_precip_SB_seasons.unstack(level=0)

# Create the line plot for Innsbruck
plt.figure(figsize=(10, 6))
plt.plot(data_Ibk.index, data_Ibk['Winter'], label='Winter', color='blue')
plt.plot(data_Ibk.index, data_Ibk['Summer'], label='Summer', color='red')

# Set labels and title for Innsbruck plot
plt.xlabel('Time')
plt.ylabel('Average Precipitation (mm/day)')
plt.title('Average Precipitation Change: Innsbruck (Winter vs Summer)')

plt.legend()
plt.show()

# Create the line plot for Sonnblick
plt.figure(figsize=(10, 6))
plt.plot(data_SB.index, data_SB['Winter'], label='Winter', color='blue')
plt.plot(data_SB.index, data_SB['Summer'], label='Summer', color='red')

# Set labels and title for Sonnblick plot
plt.xlabel('Time')
plt.ylabel('Average Precipitation (mm/day)')
plt.title('Average Precipitation Change: Hoher Sonnblick (Winter vs Summer)')

plt.legend()
plt.show()

The graph for Innsbruck shows once more that the averag precipitation almost did not change for the last 100 years. But we can clearly see that the average precipitation in the summer-months is significantly and constantly higher than in the winter-months. Contrary to the graph of Sonnblick, where the difference of the precipitation between the seasons has no clear pattern and varies from year to year. But the average precipitation over time is increasing.