Programming project¶

Preamble¶

  • Names: Constantin Nitsche, Rainer Michelak
  • Matrikelnummer:

01 - An energy balance model with hysteresis¶

Based on the code we wrote in week 07, code the following extension. For this exercise it's OK to copy-paste my solutions and start from there. Only copy the necessary (no useless code)!

The planetary albedo $\alpha$ is in fact changing with climate change. As the temperature drops, sea-ice and ice sheets are extending (increasing the albedo). Inversely, the albedo decreases as temperature rises. The planetary albedo of our simple energy balance model follows the following equation:

$$ \alpha = \begin{cases} 0.3,& \text{if } T \gt 280\\ 0.7,& \text{if } T \lt 250\\ a T + b, & \text{otherwise} \end{cases} $$

01-01: Compute the parameters $a$ and $b$ so that the equation is continuous at T=250K and T=280K.

1
2
3
import numpy as np
import matplotlib.pyplot as plt
import scipy as sci
1
2
3
4
5
#0.3=a*280+b
#0.7=a*250+b
a=-0.4/30
b=0.3-280*a
print("a=",a,"b=",b)
a= -0.013333333333333334 b= 4.033333333333333

01-02: Now write a function called alpha_from_temperature which accepts a single positional parameter T as input (a scalar) and returns the corresponding albedo. Test your function using doctests to make sure that it complies to the instructions above.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def alpha_from_temperature(t):
    """albedo function labeled alpha returns a temperature for a chosen temperature
    Parameters
    ----------
    temperature (T) : float

    Returns
    -------
    alpha as a float

    Examples
    --------
    >>> print(f'{alpha_from_temperature(249)}')
    0.7
    >>> print(f'{alpha_from_temperature(281)}')
    0.3
    >>> print(f'{alpha_from_temperature(0)}')
    0.7
    >>> print(f'{alpha_from_temperature(1000)}')
    0.3
    
    """
    if t>280:
        alpha=0.3
    elif t<250:
        alpha=0.7
    else:
        alpha=a*t+b
    return alpha
1
2
import doctest 
doctest.testmod()
TestResults(failed=0, attempted=4)

01-03: Adapt the existing code from week 07 to write a function called temperature_change_with_hysteresis which accepts t0 (the starting temperature in K), n_years (the number of simulation years) as positional arguments and tau (the atmosphere transmissivity) as keyword argument (default value 0.611). Verify that:

  • the stabilization temperature with t0 = 292 and default tau is approximately 288K
  • the stabilization temperature with t0 = 265 and default tau is approximately 233K
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def asr(t):
    s0 = 1362
    return (1 - alpha_from_temperature(t)) * s0 / 4

def olr(t, tau=0.611):
    sigma = 5.67E-8
    return sigma * tau * t**4

def temperature_change_with_hysteresis(t0, n_years, tau=0.611):
    """Temperature change scenario after change of transmissivity. and considering change of albedo
    

    Parameters
    ----------
    t0 : float
        the starting temperature (K)
    n_years : int
        the number of simulation years
    tau : float, optional
        the atmosphere transmissivity (-)

    Returns
    -------
    (time, temperature) : ndarrays of size n_years + 1

    Examples
    --------
    >>> y, t = temperature_change_with_hysteresis(292, 1000)
    >>> print(f'{int(t[1000])}')
    288
    >>> y, t = temperature_change_with_hysteresis(265, 1000)
    >>> print(f'{int(t[1000])}')
    233
    """
    
    years = np.arange(n_years + 1)
    temperature = np.zeros(n_years + 1)
    temperature[0] = t0
    C = 4.0e+08
    dt = 60 * 60 * 24 * 365
    for i in range(n_years):
        temperature[i + 1] = temperature[i] + dt / C * ((asr(temperature[i])) - olr(temperature[i], tau=tau))
    return years, temperature
import doctest
doctest.testmod()
TestResults(failed=0, attempted=8)

01-04: Realize a total of N simulations with starting temperatures regularly spaced between t0=206K, and t0=318K and plot them on a single plot for n_years=50. The plot should look somewhat similar to this example for N=21.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import pandas as pd
N=1000
t0=np.linspace(206,318, num=N)
n_years=50
data=[]

for i in t0:
    y,t=temperature_change_with_hysteresis(i, n_years, tau=0.611)
    plt.plot(y,t)
    plt.xlabel("time/year")
    plt.ylabel("Temperature in K")
    plt.title("Climate change scenario")
No description has been provided for this image

Bonus: only if you want (and if time permits), you can try to increase N and add colors to your plot to create a graph similar to this one.

02 - Weather station data files¶

I downloaded 10 min data from the recently launched ZAMG data hub. The data file contains selected parameters from the "INNSBRUCK-FLUGPLATZ (ID: 11804)" weather station.

You can download the data from the following links (right-click + "Save as..."):

  • station data
  • parameter metadata
  • station list from the ZAMG (in a better format than last time)

Let me open the data for you and display its content:

1
2
3
4
5
6
7
8
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True)
df.head()
df = df.drop("station", axis=1)
df.head()
DD FF GSX P RF RR SO TB1 TB2 TB3 TL TP
time
2015-01-01 00:00:00 232.0 1.0 0.0 965.7 96.0 0.0 0.0 3.2 3.6 5.2 -2.7 -3.4
2015-01-01 00:10:00 248.0 1.6 0.0 965.6 97.0 0.0 0.0 3.2 3.6 5.2 -2.4 -2.9
2015-01-01 00:20:00 252.0 1.6 0.0 965.6 97.0 0.0 0.0 3.2 3.6 5.2 -2.4 -2.9
2015-01-01 00:30:00 267.0 1.3 0.0 965.7 97.0 0.0 0.0 3.2 3.6 5.2 -2.7 -3.2
2015-01-01 00:40:00 279.0 1.4 0.0 965.7 97.0 0.0 0.0 3.2 3.6 5.2 -2.5 -3.0

02-01: after reading the documentation of the respective functions (and maybe try a few things yourself), explain in plain sentences:

  • what am I asking pandas to do with the index_col=1, parse_dates=True keyword arguments? Why am I doing this?
  • what am I asking pandas to do with .drop()? Why axis=1?

By looking at df.head() we can see a few things index_col=1 means which column to take as the index of the dataframe, i.e. what is the furthest left i.e. what is determining the index of the data. by choosing parse_dates=true we are asking pd to go through our index and parse through the values to convert them into dates. by using.drop() we are asking pd to remove the station row/column and by selecting axis=1 we are removing the station column NOT the row.

Now let me do something else from you:

1
2
dfmeta = pd.read_csv('ZEHNMIN Parameter-Metadaten.csv', index_col=0)
dfmeta.head()
Kurzbeschreibung Beschreibung Einheit
DD Windrichtung Windrichtung, vektorielles Mittel über 10 Minuten °
DDX Windrichtung zum Böenspitzenwert Windrichtung der maximalen Windgeschwindigkeit... °
DDX_FLAG Qualitätsflag der Windrichtung der maximalen W... Qualitätsflag für die Windrichtung der maximal... code
DD_FLAG Qualitätsflag der Windrichtung Qualitätsflag für die Windrichtung - Qualitäts... code
FF vektorielle Windgeschwindigkeit Windgeschwindigkeit, vektorielles Mittel über ... m/s

02-02: again, explain in plain sentences what the dfmeta.loc[df.columns] is doing, and why it works that way.

1
2
#testing and comparing the dfs
dfmeta.loc[df.columns]
Kurzbeschreibung Beschreibung Einheit
DD Windrichtung Windrichtung, vektorielles Mittel über 10 Minuten °
FF vektorielle Windgeschwindigkeit Windgeschwindigkeit, vektorielles Mittel über ... m/s
GSX Globalstrahlung Globalstrahlung, arithmetisches Mittel über 10... W/m²
P Luftdruck Luftdruck, Basiswert zur Minute10 hPa
RF Relative Feuchte Relative Luftfeuchte, Basiswert zur Minute10 %
RR Niederschlag 10 Minuten Summe des Niederschlags, Summe der ... mm
SO Sonnenscheindauer Sonnenscheindauer, Sekundensumme über 10 Minuten s
TB1 Erdbodentemperatur in 10cm Tiefe Erdbodentemperatur in 10cm Tiefe, Basiswert zu... °C
TB2 Erdbodentemperatur in 20cm Tiefe Erdbodentemperatur in 20cm Tiefe, Basiswert zu... °C
TB3 Erdbodentemperatur in 50cm Tiefe Erdbodentemperatur in 50cm Tiefe, Basiswert zu... °C
TL Lufttemperatur in 2m Lufttemperatur in 2m Höhe, Basiswert zur Minute10 °C
TP Taupunkt Taupunktstemperatur, Basiswert zur Minute10 °C

***it's finding the columns that are in our dataframe " df " above and matching the corresponding index values so in other words its finding which columns are relevant for us and then showing us the remaining data at that index dfmeta.loc[df.columns] is finding the indexes that are corresponding/matching to the columns in df and then showing us the dataframe at those indexes. similar how when you specify an index in a list, e.g. list[1] it gives us only the values at that index****

Finally, let me do a last step for you before you start coding:

1
dfh = df.resample('H').mean()
1
dfh.head()
DD FF GSX P RF RR SO TB1 TB2 TB3 TL TP
time
2015-01-01 00:00:00 257.333333 1.383333 0.0 965.666667 97.000000 0.0 0.0 3.20 3.6 5.2 -2.533333 -3.066667
2015-01-01 01:00:00 160.666667 0.766667 0.0 965.966667 97.666667 0.0 0.0 3.15 3.6 5.2 -3.250000 -3.716667
2015-01-01 02:00:00 227.000000 0.816667 0.0 966.316667 98.500000 0.0 0.0 3.10 3.6 5.2 -3.583333 -3.933333
2015-01-01 03:00:00 258.833333 1.933333 0.0 966.466667 97.000000 0.0 0.0 3.10 3.6 5.2 -3.250000 -3.766667
2015-01-01 04:00:00 241.500000 1.266667 0.0 966.416667 95.166667 0.0 0.0 3.10 3.6 5.2 -3.116667 -3.883333

02-03: explore the dfh dataframe. Explain, in plain words, what the purpose of .resample('H') followed by mean() is. Explain what .resample('H').max() and .resample('H').sum() would do.

with .resample("H") it downsamples our data into hourly bins so it takes the values which are every 10 minutes and puts them into bigger bins which are separated by the hour, so from 0-1 hr is one bin, 1-2hrs is another bin and so on, then by taking taking .mean of the new resample its taking the bin for each of the bins and so finding an hourly value. .max would find the highest value for each resample and .sum would add all the values in each bin, the sum of the values between 0-1hr ,1-2 hrs etc.

02-04: Using np.allclose, make sure that the average of the first hour (that you'll compute yourself from df) is indeed equal to the first row of dfh. Now, two variables in the dataframe have units that aren't suitable for averaging. Please convert the following variables to the correct units:

  • RR needs to be converted from the average of 10 min sums to mm/h
  • SO needs to be converted from the average of 10 min sums to s/h
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
df_firsthour=df.iloc[0:6].mean()
print(df_firsthour)

np.allclose(df_firsthour,dfh.iloc[0])

#testing to see what rr and S0 mean
print(dfmeta.loc["RR"]["Beschreibung"],
dfmeta.loc["SO"]["Beschreibung"])

#changing the average value calculated per hour, which corresponds to an average amount of rainfall in that hour per 10 min
#and the average of the seconds of sunshine per 10 minutes into secs per hour
def to_hourly(x):
    return x*6
dfh["RR"]=dfh["RR"].apply(to_hourly)
dfh["SO"]=dfh["SO"].apply(to_hourly)

#or
#dfh["RR"]=dfh["RR"]*6
#dfh["SO"]=dfh["SO"]*6
#be careful not to re-run the code as otherwise you could get a recursive error, as you are alway multiplying the RR and SO column by 6
# this could mean that you can quickly get much too high numbers
dfh.head(100)
num_days=dfh.shape[0]/24
print(num_days)
DD     257.333333
FF       1.383333
GSX      0.000000
P      965.666667
RF      97.000000
RR       0.000000
SO       0.000000
TB1      3.200000
TB2      3.600000
TB3      5.200000
TL      -2.533333
TP      -3.066667
dtype: float64
10 Minuten Summe des Niederschlags, Summe der Basiswerte über 10 Minuten Sonnenscheindauer, Sekundensumme über 10 Minuten
2557.0

From now on, we will use the hourly data only (and further aggregations when necessary). The 10 mins data are great but require a little bit more of pandas kung fu (the chinese term, not the sport) to be used efficiently.

Spend some time exploring the dfh dataframe we just created. What time period does it cover? What variables does it contain?

2557 days , weather variables from windspeed to global radiation and others

Note on pandas: all the exercises below can be done with or without pandas. Each question can be answered with very few lines of code (often one or two) with pandas, and I recommend to use it as much as possible. If you want, you can always use numpy in case of doubt: you can access the data as a numpy array with: df[column_name].values.

03 - Precipitation¶

In this section, we will focus on precipitation only.

03-01: Compute the average annual precipitation (m/year) over the 7-year period.

1
2
3
4
df_RR_A=pd.DataFrame(df["RR"].resample("A").sum()) ## Ey Consti, warum divided by 1000??
average = np.mean(df_RR_A.values)
print(f'Average annual precipitation: {np.round(average,0)} mm/year over the the 7 year-period')
df_RR_A.head(7)
Average annual precipitation: 921.0 mm/year over the the 7 year-period
RR
time
2015-12-31 852.4
2016-12-31 911.8
2017-12-31 1026.5
2018-12-31 783.0
2019-12-31 1037.3
2020-12-31 920.7
2021-12-31 912.2

03-02: What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?

1
2
3
4
RR=pd.DataFrame(df["RR"])
NONZERO=RR[RR["RR"]>0]
max_hourly=RR.max()*6
print("the lowest precipitation that wasn zero is: ", NONZERO.min(), "mm/10 minutes", "\nthe maximum hourly precipitation is: ", RR.max()*6,"mm/h" )
the lowest precipitation that wasn zero is:  RR    0.1
dtype: float64 mm/10 minutes 
the maximum hourly precipitation is:  RR    87.0
dtype: float64 mm/h

03-03: Plot a histogram of hourly precipitation, with bins of size 0.2 mm/h, starting at 0.1 mm/h and ending at 25 mm/h. Plot the same data, but this time with a logarithmic y-axis. Compute the 99th percentile (or quantile) of hourly precipitation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Plot histogram with linear y-axis
dfh['RR'][dfh['RR']>0].plot.hist(bins=np.arange(0.1, 25.2, 0.2))
plt.xlabel('Hourly Precipitation (mm/h)')
plt.ylabel('Frequency')
plt.title('Histogram of hourly precipitation')
plt.show()

# Plot histogram with logarithmic y-axis
dfh['RR'][dfh['RR']>0].plot.hist(bins=np.arange(0.1, 25.2, 0.2),logy=True)
plt.xlabel('Hourly Precipitation (mm/h)')
plt.ylabel('Frequency')
plt.title('Histogram of hourly precipitation')
plt.show()

# Compute the 99th percentile of hourly precipitation
percentile_99=np.percentile(dfh['RR'], 99)
print("99th Percentile of Hourly Precipitation: {:.2f} mm/h".format(percentile_99))
No description has been provided for this image
No description has been provided for this image
99th Percentile of Hourly Precipitation: 2.33 mm/h

03-04: Compute daily sums (mm/d) of precipitation (tip: use .resample again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).

1
2
3
daily_precipitation = df['RR'].resample('D').sum()
rain_days_per_year = (daily_precipitation >= 0.1).groupby(pd.Grouper(freq='Y')).sum().mean()
print(f'Average number of rain days per year in Innsbruck: {round(rain_days_per_year)}')
Average number of rain days per year in Innsbruck: 170

03-05: Now select (subset) the daily dataframe to keep only only daily data in the months of December, January, February (DFJ). To do this, note that dfh.index.month exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
dfd = pd.DataFrame()
dfd['RR']= df['RR'].resample('D',label='left').sum()
dfd['rainday'] = dfd['RR']>0

# Filtering the data for the specified time periods
dfd_DJF = dfd[(dfd.index.month >= 12) | (dfd.index.month <= 2)]
dfd_JJA = dfd[(dfd.index.month >= 6) & (dfd.index.month <= 8)]

# Calculating average precipitation and rainy days for DJF
avg_prec_DJF = dfd_DJF['RR'].mean()
avg_rain_days_DJF = dfd_DJF['rainday'].resample('Y').sum().mean()

# Calculating average precipitation and rainy days for JJA
avg_prec_JJA = dfd_JJA['RR'].mean()
avg_rain_days_JJA = dfd_JJA['rainday'].resample('Y').sum().mean()

# Printing the results
print(f'Average precipitation in December/January/February: {avg_prec_DJF} mm/d, average rainy days in December/January/February: {avg_rain_days_DJF}')
print(f'Average precipitation in June/July/August: {avg_prec_JJA} mm/d, average rainy days in Junge/July/August: {avg_rain_days_JJA}')
print('All values are for the timeperiod 2015 to 2021')
Average precipitation in December/January/February: 1.7726265822784812 mm/d, average rainy days in December/January/February: 37.57142857142857
Average precipitation in June/July/August: 4.024844720496894 mm/d, average rainy days in Junge/July/August: 53.142857142857146
All values are for the timeperiod 2015 to 2021

03-06: Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Filtering the data for the specified time periods
dfh_DJF = dfh[(dfh.index.month >= 12) | (dfh.index.month <= 2)]
dfh_JJA = dfh[(dfh.index.month >= 6) & (dfh.index.month <= 8)]

# Counting the occurrences above the 99th percentile for DJF and JJA
above99_DJF = (dfh_DJF['RR'] > percentile_99).sum()
above99_JJA = (dfh_JJA['RR'] > percentile_99).sum()

# Printing the results
print(f'From 2015 to 2021, the hourly precipitation was {above99_DJF} times above the 99th percentile (over the whole 7 years) in December/January/February, and {above99_JJA} times in June/July/August')
From 2015 to 2021, the hourly precipitation was 68 times above the 99th percentile (over the whole 7 years) in December/January/February, and 308 times in June/July/August

03-07: Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA. I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour exists and can be used to categorize data. Then, note that df.groupby exists and can be used exactly for that (documentation).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Compute average daily cycle for DFJ
daily_cycle_DFJ = dfh_DJF.groupby(dfh_DJF.index.hour)['RR'].mean()

# Compute average daily cycle for JJA
daily_cycle_JJA = dfh_JJA.groupby(dfh_JJA.index.hour)['RR'].mean()

# Plotting the average daily cycle for DFJ and JJA
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(daily_cycle_DFJ.index, daily_cycle_DFJ.values, label='DFJ')
ax.plot(daily_cycle_JJA.index, daily_cycle_JJA.values, label='JJA')
ax.set_xlabel('Hour of the day')
ax.set_ylabel('Average hourly precipitation')
ax.set_title('Average daily cycle of hourly precipitation')
plt.show()
No description has been provided for this image

04 - A few other variables¶

In this section, we will continue to analyze the weather station data.

04-01: Verify that the three soil temperatures have approximately the same average value over the entire period. Now plot the three soil temperature timeseries in hourly resolution over the course of the year of 2020 (example). Repeat the plot with the month of may 2020.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
df_daily=df.resample("D").sum()
TB1 = df_daily['TB1']
TB2 = df_daily['TB2']
TB3 = df_daily['TB3']

if np.isclose(TB1.mean(),TB2.mean(), rtol=40).all() and np.isclose(TB1.mean(),TB3.mean(),rtol=40).all()==True:
    print("average temperature is close enough.")
else:
    print("average temperature is not close enough.")
print("Average Temps for TB1, 2,3:", np.round(TB1.mean()), np.round(TB2.mean()), np.round(TB3.mean()))
average temperature is close enough.
Average Temps for TB1, 2,3: 1623.0 1656.0 1638.0
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
df_2020_hourly=dfh.loc["2020"]
TB1H20=df_2020_hourly["TB1"]
TB2H20=df_2020_hourly["TB2"]
TB3H20=df_2020_hourly["TB3"]
fig, axis =plt.subplots()
TB1H20.plot(ax=axis,color="limegreen", label="TB1")
TB2H20.plot(ax=axis,color="navy", label="TB2")
TB3H20.plot(ax=axis, color="red", label="TB3")
axis.set_ylabel("Temp/°C")
axis.set_title("Soil Temperature over 2020")
Text(0.5, 1.0, 'Soil Temperature over 2020')
No description has been provided for this image
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
df_hmay=dfh.loc["2020-05"]
TB1MAY=df_hmay["TB1"]
TB2MAY=df_hmay["TB2"]
TB3MAY=df_hmay["TB3"]
fig, axis = plt.subplots()
TB1MAY.plot(ax=axis,style='-', color="limegreen", label="TB1")
TB2MAY.plot(ax=axis,style='.',color="navy", label="TB2")
TB3MAY.plot(ax=axis,style='-', color="red", label="TB3")
axis.set_ylabel("Temp/°C")
axis.set_title("Soil Temperature May")
Text(0.5, 1.0, 'Soil Temperature May')
No description has been provided for this image

04-02: Plot the average daily cycle of all three soil temperatures.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
day_hour= np.arange(0,24)
TB1avg = dfh.groupby(dfh.index.hour)['TB1'].mean()
TB2avg = dfh.groupby(dfh.index.hour)['TB2'].mean()
TB3avg =dfh.groupby(dfh.index.hour)['TB3'].mean()
fig, ax = plt.subplots()
ax.plot(day_hour, TB1avg, label='TB1')
ax.plot(day_hour, TB2avg, label='TB2')
ax.plot(day_hour, TB3avg, label='TB3')
ax.set_xlabel('hour')
ax.set_ylabel('soil temperature/°C')
ax.set_title('average daily cycle of soil temp')
Text(0.5, 1.0, 'average daily cycle of soil temp')
No description has been provided for this image

04-03: Compute the difference (in °C) between the air temperature and the dewpoint temperature. Now plot this difference on a scatter plot (x-axis: relative humidity, y-axis: temperature difference).

1
2
3
4
5
6
7
df["diff"]=pd.DataFrame(df["TL"]-df["TP"])

fig, ax= plt.subplots()
ax.scatter(df["RF"], df["diff"])
ax.set_xlabel("RF in %")
ax.set_ylabel("Difference")
ax.set_title("graph")
Text(0.5, 1.0, 'graph')
No description has been provided for this image

05 - Free coding project¶

The last part of this semester project is up to you! You are free to explore whatever interests you. I however add three requirements:

  1. This section should have at least 5 original plots in it. They are the output of your analysis.
  2. This section should also use additional data that you downloaded yourself. The easiest way would probably be to download another station(s) from the ZAMG database, or data from the same station but for another time period (e.g. for trend or change analysis). You can, however, decide to do something completely different if you prefer (as long as you download and read one more file).
  3. this section should contain at least one regression or correlation analysis between two parameters. Examples:
    • between two different variables at the same station (like we did with the dewpoint above)
    • between different stations (for example, average temperature as a function of station elevation)
    • between average temperature and time (trends analysis)
    • etc.

That's it! Here are a few ideas:

  • detection of trends and changes at the station Innsbruck for 1993-2021
  • comparison of 5-yr climatologies at various stations in Tirol, taking elevation or location into account
  • compute the theoretical day length from the station's longitude and latitude (you can find solutions for this online, just let me know the source if you used a solution online), and use these computations to compare the measured sunshine duration to the maximum day length. This can be used to classify "sunshine days" for example.
  • use the python "windrose" library to plot a windrose at different locations and time of day.
  • etc.

If you have your own idea but are unsure about whether this is too much or not enough, come to see me in class! In general, the three requirements above should be enough.

My goal with this section is to let you formulate a programming goal and implement it.

1
2
3
4
5
import pandas as pd
rd=pd.read_csv("RAINIDATA.csv", index_col=0, parse_dates=True)
rd_meta=pd.read_csv("RAINIDATA Parameter-Metadaten.csv")
rd_meta_station=pd.read_csv("RAINI Stations-Metadaten.csv")
rd.head(20)
station N Pg T Td VV dd ff p rel w1boe
time
2018-01-01 00:00:00+00:00 11318 NaN 662.1 -5.1 -19.6 NaN NaN 4.0 1.7 NaN 25.0
2018-01-01 00:00:00+00:00 11120 8.0 942.1 -0.2 -0.2 2.0 NaN 2.0 0.1 NaN NaN
2018-01-01 00:00:00+00:00 11127 NaN 797.6 -2.5 -9.2 NaN NaN 1.0 0.9 NaN NaN
2018-01-01 00:00:00+00:00 11316 NaN 712.5 -0.5 -18.6 NaN NaN 1.0 1.8 NaN NaN
2018-01-01 00:00:00+00:00 11343 NaN 689.7 -4.3 -13.2 NaN NaN 10.0 0.8 NaN 22.0
2018-01-01 00:00:00+00:00 11368 NaN 964.5 3.2 1.4 NaN NaN 2.0 2.1 NaN NaN
2018-01-01 00:00:00+00:00 11035 3.0 984.4 3.3 1.5 65.0 NaN 2.0 1.6 NaN NaN
2018-01-01 01:00:00+00:00 11318 NaN 661.4 -6.4 -19.5 NaN NaN 2.0 1.9 NaN NaN
2018-01-01 01:00:00+00:00 11120 8.0 941.9 -0.2 -0.2 4.0 NaN 2.0 0.1 NaN NaN
2018-01-01 01:00:00+00:00 11127 NaN 797.8 -3.4 -8.8 NaN NaN 1.0 0.4 NaN NaN
2018-01-01 01:00:00+00:00 11316 NaN 712.0 -2.5 -18.0 NaN NaN 2.0 1.7 NaN NaN
2018-01-01 01:00:00+00:00 11343 NaN 689.2 -4.4 -13.3 NaN NaN 5.0 0.9 NaN NaN
2018-01-01 01:00:00+00:00 11368 NaN 964.9 1.8 0.7 NaN NaN 1.0 0.8 NaN NaN
2018-01-01 01:00:00+00:00 11035 NaN 983.8 2.5 1.1 NaN NaN 2.0 1.8 NaN NaN
2018-01-01 02:00:00+00:00 11318 NaN 660.7 -7.5 -17.7 NaN NaN 6.0 1.9 NaN NaN
2018-01-01 02:00:00+00:00 11120 9.0 942.4 -0.4 -0.4 4.0 NaN 2.0 0.6 NaN NaN
2018-01-01 02:00:00+00:00 11127 NaN 797.8 -4.6 -8.9 NaN NaN 1.0 0.1 NaN NaN
2018-01-01 02:00:00+00:00 11316 NaN 711.4 -5.1 -17.2 NaN NaN 1.0 1.7 NaN NaN
2018-01-01 02:00:00+00:00 11343 NaN 688.8 -5.6 -12.7 NaN 90.0 5.0 1.1 NaN NaN
2018-01-01 02:00:00+00:00 11368 NaN 965.4 1.3 0.3 NaN 30.0 1.0 0.1 NaN NaN
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
#filter for each station
wien=rd[rd["station"]==11035]
innsbru=rd[rd["station"]==11120]
obergurgl=rd[rd["station"]==11127]
pitztal=rd[rd["station"]==11316]
oetztal=rd[rd["station"]==11318]
sonn=rd[rd["station"]==11343]
waidhofen=rd[rd["station"]==11368]


fig, ax=plt.subplots(figsize=(16, 9))
wien["T"].loc["2021"].resample("D").mean().plot(label="wien, 198m",linewidth=1)
waidhofen["T"].loc["2021"].resample("D").mean().plot(label="waidhofen 365m",linewidth=1)
oetztal["T"].loc["2021"].resample("D").mean().plot(label="oetztal 3437m",linewidth=1)
sonn["T"].loc["2021"].resample("D").mean().plot(label="sonn 3109m",linewidth=1)
pitztal["T"].loc["2021"].resample("D").mean().plot(label="pitztal 2864m",linewidth=1)
obergurgl["T"].loc["2021"].resample("D").mean().plot(label="obergurgl 1941m",linewidth=1)
ax.set_xlabel("time")
ax.set_ylabel("temperature/ °C")
ax.set_title("Temperature over the past year (2021) for different altitudes stations")
ax.legend()
<matplotlib.legend.Legend at 0x15fb2a050>
No description has been provided for this image

here we can see that vienna and waidhofen is the highest temperature consistently throughout the year. we cannot see waidhofen because it is so similar on a large scale to vienna. both are quite low in altitude. for a comparison our coldest measurement is consistently our highest station. obergurgl on the other hand is much higher than expected as it is quite close to vienna relatively speaking

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# now we calculate the mean difference from vienna temperature for each of the altitudes/stations
#throughout 
#the past years
waidhofen_dif=waidhofen["T"]-wien["T"]
innsbru_dif=innsbru["T"]-wien["T"]
obergurgl_dif=obergurgl["T"]-wien["T"]
oetztal_dif=oetztal["T"]-wien["T"]
pitztal_dif=pitztal["T"]-wien["T"]
sonn_dif=sonn["T"]-wien["T"]
diff=pd.DataFrame([[365,waidhofen_dif.mean()],[581,innsbru_dif.mean()],[1941,obergurgl_dif.mean()], [3437,oetztal_dif.mean()],[2800,pitztal_dif.mean()],[3109,sonn_dif.mean()]], columns=["Station", "Mean Temperature Difference"])
diff.plot.scatter(x="Station",y="Mean Temperature Difference")
plt.title("Temperature difference (Mean) from Vienna with Height")
Text(0.5, 1.0, 'Temperature difference (Mean) from Vienna with Height')
No description has been provided for this image
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import numpy as np


def linear_regression(x, y):
    """Calculate a linear least-squares regression for two sets of measurements.

    Parameters
    ----------
    x, y: ndarray-like
        Two sets of measurements. Both arrays should be 1-dimensional 
        and have the same length. They should not contain any missing data!

    Returns
    -------
    (a, b): floats
       Parameters a and b of the linear approximation y^ = a + b x

    Examples
    --------
    >>> x = np.array([1, 2, 3, 4, 5])
    >>> y = np.array([-5.3, -2.6,  0.1,  2.8,  5.5])
    >>> a, b = linear_regression(x, y)
    >>> np.isclose(a, -8)
    True
    >>> np.isclose(b, 2.7)
    True
    """
    
    # Make sure we manipulate ndarrays
    x = np.asarray(x)
    y = np.asarray(y)

    # Least squares equations
    n = x.size
    num = n * (x * y).sum() - x.sum() * y.sum()
    den = n * (x**2).sum() - x.sum()**2
    b = num / den
    a = y.mean() - b * x.mean()
    return a, b
1
2
a,b=linear_regression(diff["Station"],diff["Mean Temperature Difference"])
a,b
(1.013528955124709, -0.005309476273798035)
1
2
3
4
5
6
7
height=np.linspace(0,4000,6)
fig, ax3=plt.subplots()
ax3.scatter(height, diff["Mean Temperature Difference"], color="#25d8e8")
ax3.plot(height, 1.013528955124709-0.005309476273798035*height, color="limegreen")
ax3.set_title("Linear Regression for Altitude vs Temperature difference from Vienna")
ax3.set_ylabel("Mena Temp diff / K")
ax3.set_xlabel("altitude /m")
Text(0.5, 0, 'altitude /m')
No description has been provided for this image

Here we can see that our coeefficient / gradient for the linear regression relating altitude and mean temperature difference between the station and Vienna is -0.0053094... K(°C)/m which relates to around 0.5K/100m or around 5.39 K/km which is pretty close to the actual Wet Adiabatic of between 0.5 to 0.6 K/m so pretty spot on!! of course we could do a larger graph with only the meta_data, but I'm afraid it may be too much datanto process quickly for the computer

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#pressure differential vs windspeed
wien_ff = wien["ff"].resample("D").mean()
innsbru_ff = innsbru["ff"].resample("D").mean()
obergurgl_ff = obergurgl["ff"].resample("D").mean()
pitztal_ff = pitztal["ff"].resample("D").mean()
oetztal_ff = oetztal["ff"].resample("D").mean()
sonn_ff = sonn["ff"].resample("D").mean()
waidhofen_ff = waidhofen["ff"].resample("D").mean()

wien_p = wien["p"]
innsbru_p = innsbru["p"]
obergurgl_p = obergurgl["p"]
pitztal_p = pitztal["p"]
oetztal_p = oetztal["p"]
sonn_p = sonn["p"]
waidhofen_p = waidhofen["p"]



wien_p = wien_p.replace(-99.8, np.nan)
innsbru_p = innsbru_p.replace(-99.8, np.nan)
obergurgl_p = obergurgl_p.replace(-99.8, np.nan)
pitztal_p = pitztal_p.replace(-99.8, np.nan)
oetztal_p = oetztal_p.replace(-99.8, np.nan)
sonn_p = sonn_p.replace(-99.8, np.nan)
waidhofen_p = waidhofen_p.replace(-99.8, np.nan)


wien_p = wien["p"].abs().resample("D").mean()
innsbru_p = innsbru["p"].abs().resample("D").mean()
obergurgl_p = obergurgl["p"].abs().resample("D").mean()
pitztal_p = pitztal["p"].abs().resample("D").mean()
oetztal_p = oetztal["p"].abs().resample("D").mean()
sonn_p = sonn["p"].abs().resample("D").mean()
waidhofen_p = waidhofen["p"].abs().resample("D").mean()

#remove the NaNs
wien_p = wien_p.replace(-99.8, np.nan)
innsbru_p = innsbru_p.replace(-99.8, np.nan)
obergurgl_p = obergurgl_p.replace(-99.8, np.nan)
pitztal_p = pitztal_p.replace(-99.8, np.nan)
oetztal_p = oetztal_p.replace(-99.8, np.nan)
sonn_p = sonn_p.replace(-99.8, np.nan)
waidhofen_p = waidhofen_p.replace(-99.8, np.nan)

# Plotting the scatter plot
plt.scatter(wien_p, wien_ff, label="Wien")
plt.scatter(innsbru_p, innsbru_ff, label="Innsbruck")
plt.scatter(obergurgl_p, obergurgl_ff, label="Obergurgl")
plt.scatter(pitztal_p, pitztal_ff, label="Pitztal")
plt.scatter(oetztal_p, oetztal_ff, label="Oetztal")
plt.scatter(sonn_p, sonn_ff, label="Sonn")
plt.scatter(waidhofen_p, waidhofen_ff, label="Waidhofen")

plt.xlabel("Pressure Differential / hpa/3 hr")
plt.ylabel("Windspeed ms^-1")
plt.title("Windspeed vs. Pressure Differential for Each Station")

plt.legend()


plt.show()
No description has been provided for this image
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#calculate the average daily temperature increase per day in june
wien_T= wien["T"].loc["2021-05"].resample("D").mean()
innsbru_T = innsbru["T"].loc["2021-05"].resample("D").mean()
obergurgl_T = obergurgl["T"].loc["2021-05"].resample("D").mean()
pitztal_T = pitztal["T"].loc["2021-05"].resample("D").mean()
oetztal_T = oetztal["T"].loc["2021-05"].resample("D").mean()
sonn_T = sonn["T"].loc["2021-05"].resample("D").mean()
waidhofen_T = waidhofen["T"].loc["2021-05"].resample("D").mean()

fig, ax= plt.subplots()
ax.plot(wien_T.index, wien_T, label="Wien")
ax.plot(innsbru_T.index, innsbru_T, label="Innsbru")
ax.plot(obergurgl_T.index, obergurgl_T, label="Obergurgl")
ax.plot(pitztal_T.index, pitztal_T, label="Pitztal")
ax.plot(oetztal_T.index, oetztal_T, label="Oetztal")
ax.plot(sonn_T.index, sonn_T, label="Sonn")
ax.plot(waidhofen_T.index, waidhofen_T, label="Waidhofen")

import scipy.stats as stats
wien_slope, wien_intercept, _, _, _ = stats.linregress(range(len(wien_T)), wien_T)
innsbru_slope, innsbru_intercept, _, _, _ = stats.linregress(range(len(innsbru_T)), innsbru_T)
obergurgl_slope, obergurgl_intercept, _, _, _ = stats.linregress(range(len(obergurgl_T)), obergurgl_T)
pitztal_slope, pitztal_intercept, _, _, _ = stats.linregress(range(len(pitztal_T)), pitztal_T)
oetztal_slope, oetztal_intercept, _, _, _ = stats.linregress(range(len(oetztal_T)), oetztal_T)
sonn_slope, sonn_intercept, _, _, _ = stats.linregress(range(len(sonn_T)), sonn_T)
waidhofen_slope, waidhofen_intercept, _, _, _ = stats.linregress(range(len(waidhofen_T)), waidhofen_T)

print("wien_slope",wien_slope,"innsbru_slope",innsbru_slope,"obergurgl_slope",obergurgl_slope,"pitztal_slope",pitztal_slope,"oetztal_slope",oetztal_slope,"sonn_slope",sonn_slope,"waidhofen_slope",waidhofen_slope)
wien_slope 0.0046185425432445065 innsbru_slope 0.013279569892473128 obergurgl_slope 0.0814072726741468 pitztal_slope 0.05137345137914913 oetztal_slope 0.055578906615240765 sonn_slope 0.0265818577606358 waidhofen_slope nan
No description has been provided for this image
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#calculate the average daily temperature increase per day in june
wien_T= wien["T"].loc["2021-05"].resample("D").mean()
innsbru_T = innsbru["T"].loc["2021-05"].resample("D").mean()
obergurgl_T = obergurgl["T"].loc["2021-05"].resample("D").mean()
pitztal_T = pitztal["T"].loc["2021-05"].resample("D").mean()
oetztal_T = oetztal["T"].loc["2021-05"].resample("D").mean()
sonn_T = sonn["T"].loc["2021-05"].resample("D").mean()
waidhofen_T = waidhofen["T"].loc["2021-05"].resample("D").mean()

fig, ax= plt.subplots()
ax.plot(wien_T.index, wien_T, label="Wien")
ax.plot(innsbru_T.index, innsbru_T, label="Innsbru")
ax.plot(obergurgl_T.index, obergurgl_T, label="Obergurgl")
ax.plot(pitztal_T.index, pitztal_T, label="Pitztal")
ax.plot(oetztal_T.index, oetztal_T, label="Oetztal")
ax.plot(sonn_T.index, sonn_T, label="Sonn")
ax.plot(waidhofen_T.index, waidhofen_T, label="Waidhofen")

import scipy.stats as stats
wien_slope, wien_intercept, _, _, _ = stats.linregress(range(len(wien_T)), wien_T)
innsbru_slope, innsbru_intercept, _, _, _ = stats.linregress(range(len(innsbru_T)), innsbru_T)
obergurgl_slope, obergurgl_intercept, _, _, _ = stats.linregress(range(len(obergurgl_T)), obergurgl_T)
pitztal_slope, pitztal_intercept, _, _, _ = stats.linregress(range(len(pitztal_T)), pitztal_T)
oetztal_slope, oetztal_intercept, _, _, _ = stats.linregress(range(len(oetztal_T)), oetztal_T)
sonn_slope, sonn_intercept, _, _, _ = stats.linregress(range(len(sonn_T)), sonn_T)
waidhofen_slope, waidhofen_intercept, _, _, _ = stats.linregress(range(len(waidhofen_T)), waidhofen_T)

print("wien_slope",wien_slope,"innsbru_slope",innsbru_slope,"obergurgl_slope",obergurgl_slope,"pitztal_slope",pitztal_slope,"oetztal_slope",oetztal_slope,"sonn_slope",sonn_slope,"waidhofen_slope",waidhofen_slope)
wien_slope 0.0046185425432445065 innsbru_slope 0.013279569892473128 obergurgl_slope 0.0814072726741468 pitztal_slope 0.05137345137914913 oetztal_slope 0.055578906615240765 sonn_slope 0.0265818577606358 waidhofen_slope nan
No description has been provided for this image
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
from windrose import WindroseAxes


rd = pd.read_csv('RAINIDATA.csv', index_col = 0)

wien=rd[rd["station"]==11035]
innsbru=rd[rd["station"]==11120]
obergurgl=rd[rd["station"]==11127]
pitztal=rd[rd["station"]==11316]
oetztal=rd[rd["station"]==11318]
sonn=rd[rd["station"]==11343]
waidhofen=rd[rd["station"]==11368]

#Windrose of windspeed and wind-direction measured at Vienna
wind_dir = wien['dd']
wind_speed = wien['ff']
ax = WindroseAxes.from_ax()
ax.bar(wind_dir, wind_speed, bins=[0,1,2.5,4,6,8,11,14])
ax.set_legend()
ax.set_title('Wind speed and wind direction in Vienna from 2018-2021')
plt.show()
No description has been provided for this image
1
2
3
4
5
6
7
8
#Windrose of boe-windspeed and wind-direction measured at Vienna
wind_dir = wien['dd']
w1boe=wien['w1boe']
ax = WindroseAxes.from_ax()
ax.bar(wind_dir, w1boe, bins=[5,10,15,20,25,30,35])
ax.set_legend()
ax.set_title('Boe wind speed and wind direction in Vienna from 2018-2021')
plt.show()
No description has been provided for this image
1
2
3
4
5
6
7
8
#Windrose of windspeed and wind-direction measured at Innsbruck
wind_dir = innsbru['dd']
wind_speed = innsbru['ff']
ax = WindroseAxes.from_ax()
ax.bar(wind_dir, wind_speed, bins=[0,1,2.5,4,6,8,11,14])
ax.set_legend()
ax.set_title('Wind speed and wind direction in Innsbruck from 2018-2021')
plt.show()
No description has been provided for this image
1
2
3
4
5
6
7
8
#Windrose of boe-windspeed and wind-direction measured at Innsbruck
wind_dir = innsbru['dd']
w1boe=innsbru['w1boe']
ax = WindroseAxes.from_ax()
ax.bar(wind_dir, w1boe, bins=[5,10,15,20,25,30,35])
ax.set_legend()
ax.set_title('Boe wind speed and wind direction in Innsbruck from 2018-2021')
plt.show()
No description has been provided for this image
1
2
3
4
5
6
7
8
#Windrose of windspeed and wind-direction measured at Sonnblick Obsvervatorium
wind_dir = sonn['dd']
wind_speed = sonn['ff']
ax = WindroseAxes.from_ax()
ax.bar(wind_dir, wind_speed, bins=[4,8,12,16,20,24])
ax.set_legend()
ax.set_title('Wind speed and wind direction at Sonnblick Observatorium from 2018-2021')
plt.show()
No description has been provided for this image
1
2
3
4
5
6
7
8
#Windrose of boe-windspeed and wind-direction measured at Sonnblick Observatorium
wind_dir = innsbru['dd']
w1boe=innsbru['w1boe']
ax = WindroseAxes.from_ax()
ax.bar(wind_dir, w1boe, bins=[5,10,15,20,25,30])
ax.set_legend()
ax.set_title('Boe wind speed and wind direction at Sonnblick Observatorium from 2018-2021')
plt.show()
No description has been provided for this image
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
rd_meta_station = pd.read_csv('RAINI Stations-Metadaten.csv', index_col = 0)

# Calculate average wind speed and wind direction for each station
wien_avg_ff = wien['ff'].mean()
wien_avg_dd = wien['dd'].mean()
#diff=pd.DataFrame([[365,waidhofen_avg_ff],[581,innsbru_dif.mean()],[1941,obergurgl_dif.mean()], [3437,oetztal_dif.mean()],[2800,pitztal_dif.mean()],[3109,sonn_dif.mean()]], columns=["Station", "Mean Temperature Difference"])
innsbru_avg_ff = innsbru['ff'].mean()
innsbru_avg_dd = innsbru['dd'].mean()

obergurgl_avg_ff = obergurgl['ff'].mean()
obergurgl_avg_dd = obergurgl['dd'].mean()

pitztal_avg_ff = pitztal['ff'].mean()
pitztal_avg_dd = pitztal['dd'].mean()

oetztal_avg_ff = oetztal['ff'].mean()
oetztal_avg_dd = oetztal['dd'].mean()

sonn_avg_ff = sonn['ff'].mean()
sonn_avg_dd = sonn['dd'].mean()

waidhofen_avg_ff = waidhofen['ff'].mean()
waidhofen_avg_dd = waidhofen['dd'].mean()

# Create a plot
fig, ax = plt.subplots()

# Plot average wind speed and wind direction for each station
ax.scatter(wien_avg_ff, wien_avg_dd, label='Wien')
ax.scatter(innsbru_avg_ff, innsbru_avg_dd, label='Innsbruck')
ax.scatter(obergurgl_avg_ff, obergurgl_avg_dd, label='Obergurgl')
ax.scatter(pitztal_avg_ff, pitztal_avg_dd, label='Pitztal')
ax.scatter(oetztal_avg_ff, oetztal_avg_dd, label='Oetztal')
ax.scatter(sonn_avg_ff, sonn_avg_dd, label='Sonnwend-Observatorium')
ax.scatter(waidhofen_avg_ff, waidhofen_avg_dd, label='Waidhofen')



# Set axis labels and plot title
ax.set_xlabel('Average Wind Speed (m/s)')
ax.set_ylabel('Average Wind Direction (degrees)')
ax.set_title('Average Wind Speed vs. Average Wind Direction')

# Add a legend
ax.legend()

# Show the plot
plt.show()
No description has been provided for this image

In the upper plots are shown the windspeed & wind-direction in the graphic form of a "windrose". We can directly see, that the highest average wind speed was measured at the Sonnwend-Observatorium. Of course, because this is the highest altitude measurement-placemant of our shown datas.