Programming project¶

Preamble¶

Names: Wallner Juliana & Huber Lena
Matrikelnummer: 12127587 & 12239503

01 - An energy balance model with hysteresis¶

Based on the code we wrote in week 07, code the following extension. For this exercise it's OK to copy-paste my solutions and start from there. Only copy the necessary (no useless code)!

The planetary albedo $\alpha$ is in fact changing with climate change. As the temperature drops, sea-ice and ice sheets are extending (increasing the albedo). Inversely, the albedo decreases as temperature rises. The planetary albedo of our simple energy balance model follows the following equation:

$$ \alpha = \begin{cases} 0.3,& \text{if } T \gt 280\\ 0.7,& \text{if } T \lt 250\\ a T + b, & \text{otherwise} \end{cases} $$

01-01: Compute the parameters $a$ and $b$ so that the equation is continuous at T=250K and T=280K.

01-02: Now write a function called alpha_from_temperature which accepts a single positional parameter T as input (a scalar) and returns the corresponding albedo. Test your function using doctests to make sure that it complies to the instructions above.

01-03: Adapt the existing code from week 07 to write a function called temperature_change_with_hysteresis which accepts t0 (the starting temperature in K), n_years (the number of simulation years) as positional arguments and tau (the atmosphere transmissivity) as keyword argument (default value 0.611). Verify that:

the stabilization temperature with t0 = 292 and default tau is approximately 288K
the stabilization temperature with t0 = 265 and default tau is approximately 233K

01-04: Realize a total of N simulations with starting temperatures regularly spaced between t0=206K, and t0=318K and plot them on a single plot for n_years=50. The plot should look somewhat similar to this example for N=21.

Bonus: only if you want (and if time permits), you can try to increase N and add colors to your plot to create a graph similar to this one.

import numpy as np
import matplotlib.pyplot as plt

# Task 01-01:

T1 = 250
T2 = 280
alpha1 = 0.7
alpha2 = 0.8

a = (alpha2 - alpha1) / (T2 - T1)
b = alpha1 - a * T1

print(a,b)

0.003333333333333336 -0.13333333333333408

# Task 01-02
def alpha_from_temperature(T):
    """
    Calculate the planetary albedo based on temperature.

    Parameters:
        T (float): Temperature in Kelvin.

    Returns:
        float: Planetary albedo.
    """
    return a * T + b

import doctest
doctest.testmod()

TestResults(failed=0, attempted=0)

# Task 01-03: Function temperature_change_with
def temperature_change_with_hysteresis(t0, n_years, tau=0.611):
    """
    Simulate temperature change with hysteresis.

    Parameters:
        t0 (float): Starting temperature in Kelvin.
        n_years (int): Number of simulation years.
        tau (float): Atmosphere transmissivity (default 0.611).

    Returns:
        np.ndarray: Array of simulated temperatures.
    """
    temperatures = np.zeros(n_years + 1)
    temperatures[0] = t0

    for i in range(n_years):
        if temperatures[i] < 273.15:
            delta_t = (1 - alpha_from_temperature(temperatures[i])) * tau * 5.67e-8 * temperatures[i] ** 4
        else:
            delta_t = - (1 - alpha_from_temperature(temperatures[i])) * tau * 5.67e-8 * temperatures[i] ** 4

        temperatures[i + 1] = temperatures[i] + delta_t

    return temperatures

# Task 01-03: Verify stabilization temperatures
stabilization_temp1 = temperature_change_with_hysteresis(292, n_years=100)[-1]
stabilization_temp2 = temperature_change_with_hysteresis(265, n_years=100)[-1]

print("Stabilization temperature with t0 = 292 K:", stabilization_temp1)
print("Stabilization temperature with t0 = 265 K:", stabilization_temp2)

Stabilization temperature with t0 = 292 K: 241.82270966037942
Stabilization temperature with t0 = 265 K: 285.4424282361341

# Task 01-04
N = 21
starting_temps = np.linspace(206, 318, N)
n_years = 50

plt.figure(figsize=(10, 6))

for t0 in starting_temps:
    temperatures = temperature_change_with_hysteresis(t0, n_years)
    plt.plot(range(n_years + 1), temperatures, alpha=0.5)

plt.xlabel("Years")
plt.ylabel("Temperature (K)")
plt.title(f"Temperature Change with Hysteresis ({n_years} years)")
plt.grid(True)
plt.show()

No description has been provided for this image

02 - Weather station data files¶

I downloaded 10 min data from the recently launched ZAMG data hub. The data file contains selected parameters from the "INNSBRUCK-FLUGPLATZ (ID: 11804)" weather station.

You can download the data from the following links (right-click + "Save as..."):

station data
parameter metadata
station list from the ZAMG (in a better format than last time)

Let me open the data for you and display its content:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True)
df = df.drop('station', axis=1)
df

	DD	FF	GSX	P	RF	RR	SO	TB1	TB2	TB3	TL	TP
time
2015-01-01 00:00:00	232.0	1.0	0.0	965.7	96.0	0.0	0.0	3.2	3.6	5.2	-2.7	-3.4
2015-01-01 00:10:00	248.0	1.6	0.0	965.6	97.0	0.0	0.0	3.2	3.6	5.2	-2.4	-2.9
2015-01-01 00:20:00	252.0	1.6	0.0	965.6	97.0	0.0	0.0	3.2	3.6	5.2	-2.4	-2.9
2015-01-01 00:30:00	267.0	1.3	0.0	965.7	97.0	0.0	0.0	3.2	3.6	5.2	-2.7	-3.2
2015-01-01 00:40:00	279.0	1.4	0.0	965.7	97.0	0.0	0.0	3.2	3.6	5.2	-2.5	-3.0
...	...	...	...	...	...	...	...	...	...	...	...	...
2021-12-31 23:10:00	257.0	1.1	0.0	959.8	98.0	0.0	0.0	3.1	3.4	3.4	-1.7	-1.7
2021-12-31 23:20:00	106.0	0.3	0.0	960.0	100.0	0.0	0.0	3.0	3.4	3.4	-2.0	-2.0
2021-12-31 23:30:00	141.0	0.5	0.0	960.2	100.0	0.0	0.0	3.0	3.4	3.4	-1.7	-1.7
2021-12-31 23:40:00	167.0	0.5	0.0	960.2	100.0	0.0	0.0	3.0	3.4	3.4	-1.6	-1.6
2021-12-31 23:50:00	275.0	0.3	0.0	960.1	100.0	0.0	0.0	3.0	3.4	3.4	-1.6	-1.7

368208 rows × 12 columns

02-01: after reading the documentation of the respective functions (and maybe try a few things yourself), explain in plain sentences:

what am I asking pandas to do with the index_col=1, parse_dates=True keyword arguments? Why am I doing this?
what am I asking pandas to do with .drop()? Why axis=1?

-- "index_col = 1" defines the first column of the dataframe as 1.

-- "parse_dates=True" asks python to infer the format of the datetime strings and if it can be successfully inferred, this inturn increases the speed of parsing them.

-- ".drop" command removes a certain column in the dataframe. This is done with the indentifier "axis=1", which henceforth searches the argument in that given axis, to remove the chosen variable from the dataframe table. In this case axis 1 is the header and easily removes a column through name.

Now let me do something else from you:

dfmeta = pd.read_csv('ZEHNMIN Parameter-Metadaten.csv', index_col=0)
dfmeta.loc[df.columns]

	Kurzbeschreibung	Beschreibung	Einheit
DD	Windrichtung	Windrichtung, vektorielles Mittel über 10 Minuten	°
FF	vektorielle Windgeschwindigkeit	Windgeschwindigkeit, vektorielles Mittel über ...	m/s
GSX	Globalstrahlung	Globalstrahlung, arithmetisches Mittel über 10...	W/m²
P	Luftdruck	Luftdruck, Basiswert zur Minute10	hPa
RF	Relative Feuchte	Relative Luftfeuchte, Basiswert zur Minute10	%
RR	Niederschlag	10 Minuten Summe des Niederschlags, Summe der ...	mm
SO	Sonnenscheindauer	Sonnenscheindauer, Sekundensumme über 10 Minuten	s
TB1	Erdbodentemperatur in 10cm Tiefe	Erdbodentemperatur in 10cm Tiefe, Basiswert zu...	°C
TB2	Erdbodentemperatur in 20cm Tiefe	Erdbodentemperatur in 20cm Tiefe, Basiswert zu...	°C
TB3	Erdbodentemperatur in 50cm Tiefe	Erdbodentemperatur in 50cm Tiefe, Basiswert zu...	°C
TL	Lufttemperatur in 2m	Lufttemperatur in 2m Höhe, Basiswert zur Minute10	°C
TP	Taupunkt	Taupunktstemperatur, Basiswert zur Minute10	°C

02-02: again, explain in plain sentences what the dfmeta.loc[df.columns] is doing, and why it works that way.

-- ".loc" accesses a certain row or column through labels and returns a series because only one pair of brackets [] is used, it specifcally looks at all the columns shown through the [df.column] command

Finally, let me do a last step for you before you start coding:

dfh = df.resample('H').mean()

02-03: explore the dfh dataframe. Explain, in plain words, what the purpose of .resample('H') followed by mean() is. Explain what .resample('H').max() and .resample('H').sum() would do.

The call '.resample' can only be used for time-series data and allows you to sift and recall data with certain timestamps and/or intervals. 'resample('H').mean' resamples the data according to 'H' and takes the mean of all columns through the time interval returning a series '.max' returns sifts through the resampled data of the interval and returns the maximum value for each respective column in a series '.sum'

02-04: Using np.allclose, make sure that the average of the first hour (that you'll compute yourself from df) is indeed equal to the first row of dfh. Now, two variables in the dataframe have units that aren't suitable for averaging. Please convert the following variables to the correct units:

RR needs to be converted from the average of 10 min sums to mm/h
SO needs to be converted from the average of 10 min sums to s/h

#02-04
#resample mean hourly for all variables but "RR" and "SO"
dfh = df.resample('H').mean()
dfh.drop(['RR','SO'], axis=1)

#squeeze varibales
dfh1 = dfh.iloc[:1]
dfh_1= dfh1.drop(["RR","SO"], axis=1).squeeze()
dfh_1

DD     257.333333
FF       1.383333
GSX      0.000000
P      965.666667
RF      97.000000
TB1      3.200000
TB2      3.600000
TB3      5.200000
TL      -2.533333
TP      -3.066667
Name: 2015-01-01 00:00:00, dtype: float64

# Resample variables "RR" and "SO" for sum hourly
dfhs= df.resample('H').sum()
dfhs=dfhs.drop(["DD","FF","GSX","P","RF","TB1","TB2","TB3","TL","TP"], axis=1)
dfhs

#squeeze varibales
dfhs1= dfhs.iloc[:1].squeeze()
dfhs1

RR    0.0
SO    0.0
Name: 2015-01-01 00:00:00, dtype: float64

#calculate mean through df
p1= df.iloc[:6,[0,1,2,3,4,7,8,9,10,11]].mean()
p1

DD     257.333333
FF       1.383333
GSX      0.000000
P      965.666667
RF      97.000000
TB1      3.200000
TB2      3.600000
TB3      5.200000
TL      -2.533333
TP      -3.066667
dtype: float64

#calculate sum of RR and SO through df
p2 = df.iloc[:6, [5,6]].sum()
p2

RR    0.0
SO    0.0
dtype: float64

#check if resampling and dataframe mean give the same values
same_val= np.allclose(p1, dfh_1, rtol=1e-05, atol=1e-08)
print('resampling- and dataframe mean calculation give the same value:',(same_val))

resampling- and dataframe mean calculation give the same value: True

#check if resampling and dataframe sums give the same values
same_val_sum= np.allclose(p2, dfhs1, rtol=1e-05, atol=1e-08)
print('resampling- and dataframe sum calculation give the same value:',(same_val_sum))

resampling- and dataframe sum calculation give the same value: True

From now on, we will use the hourly data only (and further aggregations when necessary). The 10 mins data are great but require a little bit more of pandas kung fu (the chinese term, not the sport) to be used efficiently.

Spend some time exploring the dfh dataframe we just created. What time period does it cover? What variables does it contain?

Note on pandas: all the exercises below can be done with or without pandas. Each question can be answered with very few lines of code (often one or two) with pandas, and I recommend to use it as much as possible. If you want, you can always use numpy in case of doubt: you can access the data as a numpy array with: df[column_name].values.

03 - Precipitation¶

In this section, we will focus on precipitation only.

03-01: Compute the average annual precipitation (m/year) over the 7-year period.

#calculate the sum and mean of RF
import pandas as pd

dfm= df.resample('M').mean()

annual_rr_precipitation = dfm.drop(["DD","FF","GSX","P","RF","SO","TB1","TB2","TB3","TL","TP"], axis=1)

annual_rr_precipitation

	RR
time
2015-01-31	0.015950
2015-02-28	0.005928
2015-03-31	0.011380
2015-04-30	0.011944
2015-05-31	0.034073
...	...
2021-08-31	0.028875
2021-09-30	0.021764
2021-10-31	0.009252
2021-11-30	0.023866
2021-12-31	0.008042

84 rows × 1 columns

03-02: What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?

#minimum of non zero precipiation and maxiumum hourly precipiation
smallest_non_zero_precipitation = dfh[dfh["RR"] > 0].min()
max_hourly_precipitation = dfh["RR"].max()

#time
time_max_hourly_precipitation = dfh["RR"].idxmax()
time_min_hourly_precipitation = dfh["RR"][dfh["RR"]>0].idxmin()

print("Smallest non zero precipitation: ",round(smallest_non_zero_precipitation, 3), "mm")
print("Maximal hourly precipitation: ",round(max_hourly_precipitation,2), "mm")

Smallest non zero precipitation:  DD      18.667
FF       0.150
GSX      0.000
P      910.767
RF      29.333
RR       0.017
SO       0.000
TB1      0.300
TB2      0.900
TB3      1.500
TL     -11.400
TP     -14.350
dtype: float64 mm
Maximal hourly precipitation:  3.7 mm

03-03: Plot a histogram of hourly precipitation, with bins of size 0.2 mm/h, starting at 0.1 mm/h and ending at 25 mm/h. Plot the same data, but this time with a logarithmic y-axis. Compute the 99th percentile (or quantile) of hourly precipitation.

#histogram linear
plt.figure(figsize=(10,5))
plt.hist(dfh["RR"],bins=(125),range=(0.1, 25))
plt.xlabel("hourly precipitation (mm/h)")
plt.ylabel("Frequency")
plt.title("precipitation/hour")
plt.grid
plt.show
          
#histogram logarithmic
plt.figure(figsize=(10,5))
plt.hist(dfh["RR"], log=True, bins=(125),range=(0.1, 25))
plt.xlabel("hourly precipitation (mm/h)")
plt.ylabel("Frequency")
plt.title("precipitation/hour")
plt.grid
plt.show

#compute 99th percentile of hourly precipitation
percentile_99 = dfh["RR"].quantile(0.99)
print("99th percentile of hourly precipitation: ",round(percentile_99,2),"mm/h")

99th percentile of hourly precipitation:  0.39 mm/h

03-04: Compute daily sums (mm/d) of precipitation (tip: use .resample again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).

#daily sum of precipitation
dfd_r= df.resample('D').sum()
dfd_r= dfd_r.drop(["DD","FF","GSX","P","RF","SO","TB1","TB2","TB3","TL","TP"], axis=1)
print((dfd_r), "mm/d")

#average rainy days

avg_dfd_r= (dfd_r >0.1).sum()/7
print((avg_dfd_r), "rainy days per year")

              RR
time            
2015-01-01   0.0
2015-01-02   0.6
2015-01-03  12.2
2015-01-04   7.7
2015-01-05   0.0
...          ...
2021-12-27   0.0
2021-12-28   0.0
2021-12-29  15.1
2021-12-30   2.1
2021-12-31   0.0

[2557 rows x 1 columns] mm/d
RR    158.857143
dtype: float64 rainy days per year

03-05: Now select (subset) the daily dataframe to keep only only daily data in the months of December, January, February (DFJ). To do this, note that dfh.index.month exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).

#select DJF: December to February (12-2)
djf = dfd_r[(dfd_r.index.month == 12)|(dfd_r.index.month <=2)]

#computing the average precipitation of DJF
average_precipitation_djf = djf.mean()

#counting the day on which it rained. Divided by 7 (7years)
aver_rainy_days_djf = (djf >= 0.1).sum()/7

print("Average precipitation in DJF: ",round(average_precipitation_djf,2),"mm/d")
print("Average number of days where it rained in DJF: ",round(aver_rainy_days_djf))

Average precipitation in DJF:  RR    1.77
dtype: float64 mm/d
Average number of days where it rained in DJF:  RR    38.0
dtype: float64

#select JJA: June to August (12-2)
jja = dfd_r[(dfd_r.index.month >= 6)&(dfd_r.index.month <=8)]

#computing the average precipitation of JJA
average_precipitation_jja = jja.mean()

#counting the day on which it rained. Divided by 7 (7years)
aver_rainy_days_jja = (jja >= 0.1).sum()/7

print("Average precipitation in JJA: ",round(average_precipitation_jja,2),"mm/d")
print("Average number of days where it rained in JJA: ",round(aver_rainy_days_jja))

Average precipitation in JJA:  RR    4.02
dtype: float64 mm/d
Average number of days where it rained in JJA:  RR    53.0
dtype: float64

03-06: Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.

#subset for DJF & JJA
djf = dfh[(dfh.index.month <= 2)|(dfh.index.month == 12)]
jja = dfh[(dfh.index.month >= 6) & (dfh.index.month <= 8)]

print("DJF hourly over 99th percentile: ",len(djf["RR"][djf["RR"]>percentile_99]))
print("JJA hourly over 99th percentile: ",len(jja["RR"][jja["RR"]<percentile_99]))

DJF hourly over 99th percentile:  68
JJA hourly over 99th percentile:  15148

03-07: Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA. I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour exists and can be used to categorize data. Then, note that df.groupby exists and can be used exactly for that (documentation).

#compute the average daily cycles of DJF & JJA
djf_daily_cycle = djf.groupby(djf.index.hour)["RR"].mean()
jja_daily_cycle = jja.groupby(jja.index.hour)["RR"].mean()

fig, ax = plt.subplots(figsize=(9,5))
ax.plot(djf_daily_cycle, label="DJF")
ax.plot(jja_daily_cycle, label="JJA")
ax.set_title("Average daily cycle of hourly precipitation")
ax.set_xlabel("time of the day in h")
ax.set_ylabel("Average hourly precipitation in mm/h")
ax.legend()
ax.grid()
plt.show()

04 - A few other variables¶

In this section, we will continue to analyze the weather station data.

04-01: Verify that the three soil temperatures have approximately the same average value over the entire period. Now plot the three soil temperature timeseries in hourly resolution over the course of the year of 2020 (example). Repeat the plot with the month of may 2020.

#04-01
#mean temperature of soils over whole time period
TB1_avg= dfh['TB1'].mean()
print('Soil 1 has an average mean temperature of:',(TB1_avg),'°C')
TB2_avg= dfh['TB2'].mean()
print('Soil 2 has an average mean temperature of:',(TB2_avg),'°C')
TB3_avg= dfh['TB3'].mean()
print('Soil 3 has an average mean temperature of:',(TB3_avg),'°C\n')

#use allclose to show soils mean temperatures are approximately the same, with a tolerance of 0.35°C, over the time period.
sim1= np.allclose(TB1_avg, TB2_avg, rtol=0.35, atol= 0.35)
print('Soil 1 and 2 are approximately the same:', (sim1))
sim2=np.allclose(TB2_avg, TB3_avg, rtol=0.35, atol=0.35)
print('Soil 2 and 3 are approximately the same:', (sim2))
sim3= np.allclose(TB1_avg, TB3_avg, rtol=0.35, atol=0.35)
print('Soil 1 and 3 are approximately the same:', (sim3))

Soil 1 has an average mean temperature of: 11.393769492976766 °C
Soil 2 has an average mean temperature of: 11.514406669059351 °C
Soil 3 has an average mean temperature of: 11.379132772780602 °C

Soil 1 and 2 are approximately the same: True
Soil 2 and 3 are approximately the same: True
Soil 1 and 3 are approximately the same: True

#filter out data for year 2020

dfh_2020 = dfh.loc['2020-01-01':'2020-12-31']

#Drop columns all apart from TB1,TB2,TB3 from dataframe

dfh_2020_soil = dfh_2020.drop(["DD","FF","GSX","P","RF","RR","SO","TL","TP"], axis=1)

#plot functions for 2020

ax = dfh_2020_soil.plot(figsize=(8,5))
ax.set_xlabel('Time')
ax.set_ylabel('Temperature °C')

Text(0, 0.5, 'Temperature °C')

#filter out data for month may in year 2020
dfh_2020_may = dfh.loc['2020-05-01':'2020-05-31']

#Drop columns apart from TB1,TB2,TB3 from dataframe
dfh_2020_may_soil = dfh_2020_may.drop(["DD","FF","GSX","P","RF","RR","SO","TL","TP"], axis=1)

#plot functions for may 2020
ax = dfh_2020_may_soil.plot(figsize=(8,5))
ax.set_xlabel('Time')
ax.set_ylabel('Temperature °C')

Text(0, 0.5, 'Temperature °C')

04-02: Plot the average daily cycle of all three soil temperatures.

#04-02

#resmaple data for daily values and take mean
dfh_davg= dfh.resample('D').mean()

#drop columns apart from TB1,TB2,TB3 from Dataframe
dfh_davg= dfh_davg.drop(["DD","FF","GSX","P","RF","RR","SO","TL","TP"], axis=1)

#plot daily average of soils
ax= dfh_davg.plot(figsize=(8,5))
ax.set_xlabel('Time')
ax.set_ylabel('Temperature °C')

Text(0, 0.5, 'Temperature °C')

04-03: Compute the difference (in °C) between the air temperature and the dewpoint temperature. Now plot this difference on a scatter plot (x-axis: relative humidity, y-axis: temperature difference).

#04-03
#create new column in dataframe for difference between air temperature and dew point temperature names "TL-TP"
dfh_2=dfh
dfh_2["TL-TP"] = dfh_2["TL"] - dfh_2["TP"]

# define new dataframe function through dropping of columns for temperature difference and relative humidity
dfh_tempdiff = dfh_2.drop(["DD","FF","GSX","P","RF","RR","SO","TB1", "TB2", "TB3","TL","TP"], axis=1)
dfh_hum = dfh_2.drop(["DD","FF","GSX","P","RR","SO","TB1", "TB2", "TB3","TL","TP","TL-TP"], axis=1)

#plot temperature difference against relative humidity
fig, ax = plt.subplots(figsize=(8,5))
ax.scatter(dfh_hum, dfh_tempdiff,s=20, alpha = 0.5)
ax.set_xlabel('relative humiditiy %')
ax.set_ylabel('temperature difference °C')

Text(0, 0.5, 'temperature difference °C')

05 - Free coding project¶

This section should have at least 5 original plots in it. They are the output of your analysis.
This section should also use additional data that you downloaded yourself. The easiest way would probably be to download another station(s) from the ZAMG database, or data from the same station but for another time period (e.g. for trend or change analysis). You can, however, decide to do something completely different if you prefer (as long as you download and read one more file).
this section should contain at least one regression or correlation analysis between two parameters. Examples:
- between two different variables at the same station (like we did with the dewpoint above)
- between different stations (for example, average temperature as a function of station elevation)
- between average temperature and time (trends analysis)
- etc.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

We have choosen to resample our following data in hourly means and sums since it was more consice to use on our first graph.

#import st anton data and read/interpret
kf = pd.read_csv('ZEHNMIN St anton.csv', index_col=0, parse_dates=True)
#drop station column
kf= kf.drop('station', axis=1)
#resample data for hourly mean except for precipitation and sunshine
kfh_m = kf.resample('H').mean()
kfh_m = kfh_m.drop(['RR','SO','TL','TP'], axis=1)
kfh_m

	P	RF
time
2015-01-01 00:00:00+00:00	882.316667	85.333333
2015-01-01 01:00:00+00:00	882.600000	84.333333
2015-01-01 02:00:00+00:00	882.716667	87.833333
2015-01-01 03:00:00+00:00	882.800000	88.833333
2015-01-01 04:00:00+00:00	882.750000	89.166667
...	...	...
2023-06-08 19:00:00+00:00	870.500000	88.833333
2023-06-08 20:00:00+00:00	870.966667	89.166667
2023-06-08 21:00:00+00:00	871.116667	90.166667
2023-06-08 22:00:00+00:00	871.133333	91.166667
2023-06-08 23:00:00+00:00	871.050000	91.000000

73944 rows × 2 columns

Above you can see the hourly means of Pressure and Relative humidity in St Anton

#import galzig data and read/interpret
ef = pd.read_csv('ZEHNMIN Galzig.csv', index_col=0, parse_dates=True)
#drop station column
ef= ef.drop('station', axis=1)
#resample data for hourly mean except for precipitation and sunshine
efh_m= ef.resample('H').mean()
efh_m = efh_m.drop(['RR','SO','TL','TP'], axis=1)
efh_m

	P	RF
time
2015-01-01 00:00:00+00:00	797.133333	77.833333
2015-01-01 01:00:00+00:00	797.616667	76.666667
2015-01-01 02:00:00+00:00	797.750000	78.666667
2015-01-01 03:00:00+00:00	797.766667	81.166667
2015-01-01 04:00:00+00:00	797.800000	82.500000
...	...	...
2023-06-08 19:00:00+00:00	791.716667	67.000000
2023-06-08 20:00:00+00:00	792.083333	63.833333
2023-06-08 21:00:00+00:00	792.050000	67.000000
2023-06-08 22:00:00+00:00	792.000000	70.166667
2023-06-08 23:00:00+00:00	791.683333	72.166667

73944 rows × 2 columns

Above you can see the hourly mean of pressure and relative humidity in Galzig

#resample St anton data for hourly sum for precipitation and sunshine
kfh_s = kf.resample('H').sum()
kfh_s = kfh_s.drop(['P','RF','TL','TP','SO'], axis=1)
kfh_s

	RR
time
2015-01-01 00:00:00+00:00	0.0
2015-01-01 01:00:00+00:00	0.0
2015-01-01 02:00:00+00:00	0.0
2015-01-01 03:00:00+00:00	0.0
2015-01-01 04:00:00+00:00	0.0
...	...
2023-06-08 19:00:00+00:00	0.0
2023-06-08 20:00:00+00:00	0.0
2023-06-08 21:00:00+00:00	0.0
2023-06-08 22:00:00+00:00	0.0
2023-06-08 23:00:00+00:00	0.0

73944 rows × 1 columns

Above is the sum of precipitation per hour in St Anton

#resample Galzig data for hourly sum for precipitation and sunshine
efh_s = ef.resample('H').sum()
efh_s = efh_s.drop(['P','RF','TL','TP','SO'], axis=1)
efh_s

	RR
time
2015-01-01 00:00:00+00:00	0.0
2015-01-01 01:00:00+00:00	0.0
2015-01-01 02:00:00+00:00	0.0
2015-01-01 03:00:00+00:00	0.0
2015-01-01 04:00:00+00:00	0.0
...	...
2023-06-08 19:00:00+00:00	0.0
2023-06-08 20:00:00+00:00	0.0
2023-06-08 21:00:00+00:00	0.0
2023-06-08 22:00:00+00:00	0.0
2023-06-08 23:00:00+00:00	0.0

73944 rows × 1 columns

Above is the sum of precipitation per hour in Galzig

First we would to compare the precipitation values between the two stations, we will do this graphically

fig, ax = plt.subplots(figsize=(16,9))
ax.plot(kfh_s, label='st anton')
ax.plot(efh_s, label='galzig')
ax.legend()

ax.set_xlabel('time')
ax.set_ylabel('precipitation mm/H')
ax.set_title('comparison of hourly precipitation between st anton and galzig')

Text(0.5, 1.0, 'comparison of hourly precipitation between st anton and galzig')

Here we can see a very clear maximum in the St Anton precipitation, and we would to investigate this more closley, mainly because in Galzig which is only 1.23km away, the precipitation was much much less. We will calculate the exact values as followed:

#maximum of hour of precipitation for st anton
kfh_s.idxmax()

RR   2018-08-01 16:00:00+00:00
dtype: datetime64[ns, UTC]

#how much precipitation there was this hour
kfh_s.loc['2018-08-01 16:00']

RR    27.4
Name: 2018-08-01 16:00:00+00:00, dtype: float64

#the precipitation in Galzig during St Anton maximum precipitation (mm/H)  
print('Precipitation in Galzig when St Anton had its maximum:',efh_s.loc['2018-08-01 16:00'])

Precipitation in Galzig when St Anton had its maximum: RR    10.5
Name: 2018-08-01 16:00:00+00:00, dtype: float64

#difference in precipitation (mm/H)

kfh_s.loc['2018-08-01 16:00']-efh_s.loc['2018-08-01 16:00']

RR    16.9
Name: 2018-08-01 16:00:00+00:00, dtype: float64

We will now investigate the other climate variables in both stations to see what may have caused st anton to have 16.9 mm/H more rainfall than Galzig

Precipitation in the hour¶

We would like to get a more localised graph for precipitation during this hour, so we can see exactly how the volume of precipitation changed over time, and for this we will use our 10 minute mean data.

kf_08_01= kf.loc['2018-08-01 16:00':'2018-08-01 16:50']
kf_08_01=kf_08_01.drop(['SO','TL','TP'], axis=1)
kf_08_01

	P	RF	RR
time
2018-08-01 16:00:00+00:00	875.7	82.0	0.9
2018-08-01 16:10:00+00:00	876.8	82.0	4.0
2018-08-01 16:20:00+00:00	877.7	91.0	10.1
2018-08-01 16:30:00+00:00	877.4	90.0	7.8
2018-08-01 16:40:00+00:00	877.4	92.0	3.7
2018-08-01 16:50:00+00:00	877.7	91.0	0.9

ef_08_01= ef.loc['2018-08-01 16:00':'2018-08-01 16:50']
ef_08_01= ef_08_01.drop(['SO','TL','TP'], axis=1)
ef_08_01

	P	RF	RR
time
2018-08-01 16:00:00+00:00	798.9	83.0	1.8
2018-08-01 16:10:00+00:00	799.5	93.0	1.0
2018-08-01 16:20:00+00:00	799.7	88.0	5.6
2018-08-01 16:30:00+00:00	799.6	86.0	0.9
2018-08-01 16:40:00+00:00	799.3	82.0	0.8
2018-08-01 16:50:00+00:00	799.6	85.0	0.4

#plot a localised graph of precipitation suring 16:00-17:00 on 2018-08-01
efrr_08_01= ef_08_01.drop(['RF','P'], axis=1)
kfrr_08_01= kf_08_01.drop(['RF','P'], axis=1)

fig, rx = plt.subplots()
rx.plot(kfrr_08_01, label='St anton')
rx.plot(efrr_08_01, label='Galzig')
rx.set_title('comparison of precipiation in mm every 10 minutes in the hour 16:00-17:00 on 2018-08-01')
rx.legend()
rx.set_xlabel('time')
rx.set_ylabel('Precipitation (mm/10 minutes)')

Text(0, 0.5, 'Precipitation (mm/10 minutes)')

kf_08_01['RR'].loc['2018-08-01 16:20']-ef_08_01['RR'].loc['2018-08-01 16:20']

4.5

This graph gives us a much more clear overview into how precipitation changed over time in both station with a very clear maximum in both at 16:20, but with a difference of 4.5mm/10 minutes.

To see why this may have occured we will continue to investigate our other variables, Relative humidity and Pressure

Pressure¶

Next we will investigate pressure graphically with our 10 minutes data in the same time period ("16:00 - 17:00" 2018-08-01)

#the change the pressure in the hour 16:00-17:00 on the 2018-08-01 in both st anton and galzig

kfp_08_01= kf_08_01.drop(['RF','RR'], axis=1)
efp_08_01= ef_08_01.drop(['RF','RR'], axis=1)

fig, px = plt.subplots()

px.plot(kfp_08_01, label='St Anton')
px.plot(efp_08_01, label='Galzig')
px.legend()
px.set_xlabel('time')
px.set_ylabel('Pressure (hPa)')

Text(0, 0.5, 'Pressure (hPa)')

Checking the change in the pressure at this point is 1) to time localised to see a major change of pressure during the 60 minutes, and 2) since the altitude is fixed at both stations, pressure will not have a large effect on the massive influx of precipitation, so pressure is not a good climate variable to consider.

But to have a closer look we will compare precipitation with time and pressure with time just in St Anton and calculate the correlation:

#plot a graph of precipitation and relative humidity against during the time interval  ('16:00-17:00' 2018-08-01)


fig, lx = plt.subplots(figsize=(16,9))
lx.plot(kfrr_08_01, label='St anton precipitation')
lx1 = lx.twinx()
lx1.plot(kfp_08_01, label='St anton pressure', color='green')

lx.legend(loc='upper left')
lx1.legend(loc='upper right')
lx.set_title('How precipitation changes with pressure')
lx.set_xlabel('time')
lx1.set_ylabel('pressure (hPa)')
lx.set_ylabel('Precipitaion (mm/10 minutes)')

Text(0, 0.5, 'Precipitaion (mm/10 minutes)')

In the graph we are able to see a simultaneous increase for both variables until the maximum point at 16:20, whereafter precipitation makes a huge drop and pressure continues to stay relatively constant, making minor changes in the last 30 minutes of the hour.

To check this relationship we will calculate the correlation:

#calculate the correlation between pressure and precipitation
corr_1= kf_08_01['P'].corr(kf_08_01['RR'])
print(f'{corr_1:.5f}')

0.50001

The correlation is high since it lies in the bracket ±0.50 - ±1, and is positive So although it may not have seemed in the first graph that pressure changed signifcantly, it does have a large correlation with precipitation, and therefore impacts one another.

Humidity¶

Next we would like to investigate the Relative Humidity in the time period ("16:00 - 17:00" 2018-08-01), and we will once again do this graphically in order to have a better visual

#plot relative humidity against time in both station during time interval ('16:00-17:00' 2018-08-01)
efh_08_01 = ef_08_01.drop(['P','RR'], axis=1)
kfh_08_01 = kf_08_01.drop(['P','RR'], axis=1)

fig, hx = plt.subplots()
hx.plot(kfh_08_01, label='St anton')
hx.plot(efh_08_01, label='Galzig')
hx.legend()
hx.set_xlabel('Time')
hx.set_ylabel('relative humidity (%)')

Text(0, 0.5, 'relative humidity (%)')

Here we can see a large influx of relative humidity between 16:10 and 16:20 for St Anton, and a large drop in relative humidity in Galzig between 16:10- 16:40.

We would like to investigate the correlation between relative humidity and precipitation deeper in both stations:

#plot a graph of of precipitation against relative humidity in time interval ('16:00-17:00' 2018-08-01)

fig, jx = plt.subplots(figsize=(16,9))
jx.plot(kfrr_08_01, label='St anton precipitation')
jx1 = jx.twinx()
jx1.plot(kfh_08_01, label='St anton humidity', color='green')

jx.legend(loc='upper left')
jx1.legend()
jx.set_title('How precipitation changes with relative humidity')
jx.set_xlabel('time')
jx1.set_ylabel('Relative humidty (%)')
jx.set_ylabel('Precipitaion (mm/10 minutes)')

Text(0, 0.5, 'Precipitaion (mm/10 minutes)')

In this direct comparison we can see that during the time interval 16:00 - 16:20 humidity and precipitation increase at similar rate, but after 16:20 precipitation (mm/10 minutes) decreases back to a low of 0.9mm/10 minutes whereas humidity has a slight increase again.

#calculate the correlation between relative humidity and precipitation
corr= kf_08_01['RF'].corr(kf_08_01['RR'])
print(f'{corr:.5f}')

0.39017

Here we can see that there is a medium correlation between the two factors since it lies between ±0.30 and ±0.49. Therefore we can conclude that the two variables, relative humidity and precipitation, are linked and impact one another.

Overall¶

Through this investigation we have found that Relative Humidity and Pressure do infact impact Precipitation and there is a medium and high correlation between the two variables and Precipitation respectively.