Programming project¶
Preamble¶
- Names: Wallner Juliana & Huber Lena
- Matrikelnummer: 12127587 & 12239503
01 - An energy balance model with hysteresis¶
Based on the code we wrote in week 07, code the following extension. For this exercise it's OK to copy-paste my solutions and start from there. Only copy the necessary (no useless code)!
The planetary albedo $\alpha$ is in fact changing with climate change. As the temperature drops, sea-ice and ice sheets are extending (increasing the albedo). Inversely, the albedo decreases as temperature rises. The planetary albedo of our simple energy balance model follows the following equation:
$$ \alpha = \begin{cases} 0.3,& \text{if } T \gt 280\\ 0.7,& \text{if } T \lt 250\\ a T + b, & \text{otherwise} \end{cases} $$
01-01: Compute the parameters $a$ and $b$ so that the equation is continuous at T=250K and T=280K.
01-02: Now write a function called alpha_from_temperature
which accepts a single positional parameter T
as input (a scalar) and returns the corresponding albedo. Test your function using doctests to make sure that it complies to the instructions above.
01-03: Adapt the existing code from week 07 to write a function called temperature_change_with_hysteresis
which accepts t0
(the starting temperature in K), n_years
(the number of simulation years) as positional arguments and tau
(the atmosphere transmissivity) as keyword argument (default value 0.611). Verify that:
- the stabilization temperature with
t0 = 292
and default tau is approximately 288K - the stabilization temperature with
t0 = 265
and default tau is approximately 233K
01-04: Realize a total of N simulations with starting temperatures regularly spaced between t0
=206K, and t0
=318K and plot them on a single plot for n_years
=50. The plot should look somewhat similar to this example for N=21
.
Bonus: only if you want (and if time permits), you can try to increase N and add colors to your plot to create a graph similar to this one.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | import numpy as np import matplotlib.pyplot as plt # Task 01-01: T1 = 250 T2 = 280 alpha1 = 0.7 alpha2 = 0.8 a = (alpha2 - alpha1) / (T2 - T1) b = alpha1 - a * T1 print(a,b) |
0.003333333333333336 -0.13333333333333408
1 2 3 4 5 6 7 8 9 10 11 12 | # Task 01-02 def alpha_from_temperature(T): """ Calculate the planetary albedo based on temperature. Parameters: T (float): Temperature in Kelvin. Returns: float: Planetary albedo. """ return a * T + b |
1 2 | import doctest doctest.testmod() |
TestResults(failed=0, attempted=0)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | # Task 01-03: Function temperature_change_with def temperature_change_with_hysteresis(t0, n_years, tau=0.611): """ Simulate temperature change with hysteresis. Parameters: t0 (float): Starting temperature in Kelvin. n_years (int): Number of simulation years. tau (float): Atmosphere transmissivity (default 0.611). Returns: np.ndarray: Array of simulated temperatures. """ temperatures = np.zeros(n_years + 1) temperatures[0] = t0 for i in range(n_years): if temperatures[i] < 273.15: delta_t = (1 - alpha_from_temperature(temperatures[i])) * tau * 5.67e-8 * temperatures[i] ** 4 else: delta_t = - (1 - alpha_from_temperature(temperatures[i])) * tau * 5.67e-8 * temperatures[i] ** 4 temperatures[i + 1] = temperatures[i] + delta_t return temperatures # Task 01-03: Verify stabilization temperatures stabilization_temp1 = temperature_change_with_hysteresis(292, n_years=100)[-1] stabilization_temp2 = temperature_change_with_hysteresis(265, n_years=100)[-1] print("Stabilization temperature with t0 = 292 K:", stabilization_temp1) print("Stabilization temperature with t0 = 265 K:", stabilization_temp2) |
Stabilization temperature with t0 = 292 K: 241.82270966037942 Stabilization temperature with t0 = 265 K: 285.4424282361341
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | # Task 01-04 N = 21 starting_temps = np.linspace(206, 318, N) n_years = 50 plt.figure(figsize=(10, 6)) for t0 in starting_temps: temperatures = temperature_change_with_hysteresis(t0, n_years) plt.plot(range(n_years + 1), temperatures, alpha=0.5) plt.xlabel("Years") plt.ylabel("Temperature (K)") plt.title(f"Temperature Change with Hysteresis ({n_years} years)") plt.grid(True) plt.show() |
02 - Weather station data files¶
I downloaded 10 min data from the recently launched ZAMG data hub. The data file contains selected parameters from the "INNSBRUCK-FLUGPLATZ (ID: 11804)" weather station.
You can download the data from the following links (right-click + "Save as..."):
- station data
- parameter metadata
- station list from the ZAMG (in a better format than last time)
Let me open the data for you and display its content:
1 2 3 | import pandas as pd import numpy as np import matplotlib.pyplot as plt |
1 2 3 | df = pd.read_csv('INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True) df = df.drop('station', axis=1) df |
DD | FF | GSX | P | RF | RR | SO | TB1 | TB2 | TB3 | TL | TP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
time | ||||||||||||
2015-01-01 00:00:00 | 232.0 | 1.0 | 0.0 | 965.7 | 96.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.7 | -3.4 |
2015-01-01 00:10:00 | 248.0 | 1.6 | 0.0 | 965.6 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.4 | -2.9 |
2015-01-01 00:20:00 | 252.0 | 1.6 | 0.0 | 965.6 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.4 | -2.9 |
2015-01-01 00:30:00 | 267.0 | 1.3 | 0.0 | 965.7 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.7 | -3.2 |
2015-01-01 00:40:00 | 279.0 | 1.4 | 0.0 | 965.7 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.5 | -3.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-12-31 23:10:00 | 257.0 | 1.1 | 0.0 | 959.8 | 98.0 | 0.0 | 0.0 | 3.1 | 3.4 | 3.4 | -1.7 | -1.7 |
2021-12-31 23:20:00 | 106.0 | 0.3 | 0.0 | 960.0 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -2.0 | -2.0 |
2021-12-31 23:30:00 | 141.0 | 0.5 | 0.0 | 960.2 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -1.7 | -1.7 |
2021-12-31 23:40:00 | 167.0 | 0.5 | 0.0 | 960.2 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -1.6 | -1.6 |
2021-12-31 23:50:00 | 275.0 | 0.3 | 0.0 | 960.1 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -1.6 | -1.7 |
368208 rows × 12 columns
02-01: after reading the documentation of the respective functions (and maybe try a few things yourself), explain in plain sentences:
- what am I asking pandas to do with the
index_col=1, parse_dates=True
keyword arguments? Why am I doing this? - what am I asking pandas to do with
.drop()
? Whyaxis=1
?
-- "index_col = 1" defines the first column of the dataframe as 1.
-- "parse_dates=True" asks python to infer the format of the datetime strings and if it can be successfully inferred, this inturn increases the speed of parsing them.
-- ".drop" command removes a certain column in the dataframe. This is done with the indentifier "axis=1", which henceforth searches the argument in that given axis, to remove the chosen variable from the dataframe table. In this case axis 1 is the header and easily removes a column through name.
Now let me do something else from you:
1 2 | dfmeta = pd.read_csv('ZEHNMIN Parameter-Metadaten.csv', index_col=0) dfmeta.loc[df.columns] |
Kurzbeschreibung | Beschreibung | Einheit | |
---|---|---|---|
DD | Windrichtung | Windrichtung, vektorielles Mittel über 10 Minuten | ° |
FF | vektorielle Windgeschwindigkeit | Windgeschwindigkeit, vektorielles Mittel über ... | m/s |
GSX | Globalstrahlung | Globalstrahlung, arithmetisches Mittel über 10... | W/m² |
P | Luftdruck | Luftdruck, Basiswert zur Minute10 | hPa |
RF | Relative Feuchte | Relative Luftfeuchte, Basiswert zur Minute10 | % |
RR | Niederschlag | 10 Minuten Summe des Niederschlags, Summe der ... | mm |
SO | Sonnenscheindauer | Sonnenscheindauer, Sekundensumme über 10 Minuten | s |
TB1 | Erdbodentemperatur in 10cm Tiefe | Erdbodentemperatur in 10cm Tiefe, Basiswert zu... | °C |
TB2 | Erdbodentemperatur in 20cm Tiefe | Erdbodentemperatur in 20cm Tiefe, Basiswert zu... | °C |
TB3 | Erdbodentemperatur in 50cm Tiefe | Erdbodentemperatur in 50cm Tiefe, Basiswert zu... | °C |
TL | Lufttemperatur in 2m | Lufttemperatur in 2m Höhe, Basiswert zur Minute10 | °C |
TP | Taupunkt | Taupunktstemperatur, Basiswert zur Minute10 | °C |
02-02: again, explain in plain sentences what the dfmeta.loc[df.columns]
is doing, and why it works that way.
-- ".loc" accesses a certain row or column through labels and returns a series because only one pair of brackets [] is used, it specifcally looks at all the columns shown through the [df.column] command
Finally, let me do a last step for you before you start coding:
1 | dfh = df.resample('H').mean() |
02-03: explore the dfh
dataframe. Explain, in plain words, what the purpose of .resample('H')
followed by mean()
is. Explain what .resample('H').max()
and .resample('H').sum()
would do.
The call '.resample' can only be used for time-series data and allows you to sift and recall data with certain timestamps and/or intervals. 'resample('H').mean' resamples the data according to 'H' and takes the mean of all columns through the time interval returning a series '.max' returns sifts through the resampled data of the interval and returns the maximum value for each respective column in a series '.sum'
02-04: Using np.allclose
, make sure that the average of the first hour (that you'll compute yourself from df
) is indeed equal to the first row of dfh
. Now, two variables in the dataframe have units that aren't suitable for averaging. Please convert the following variables to the correct units:
RR
needs to be converted from the average of 10 min sums to mm/hSO
needs to be converted from the average of 10 min sums to s/h
1 2 3 4 5 6 7 8 9 | #02-04 #resample mean hourly for all variables but "RR" and "SO" dfh = df.resample('H').mean() dfh.drop(['RR','SO'], axis=1) #squeeze varibales dfh1 = dfh.iloc[:1] dfh_1= dfh1.drop(["RR","SO"], axis=1).squeeze() dfh_1 |
DD 257.333333 FF 1.383333 GSX 0.000000 P 965.666667 RF 97.000000 TB1 3.200000 TB2 3.600000 TB3 5.200000 TL -2.533333 TP -3.066667 Name: 2015-01-01 00:00:00, dtype: float64
1 2 3 4 5 6 7 8 | # Resample variables "RR" and "SO" for sum hourly dfhs= df.resample('H').sum() dfhs=dfhs.drop(["DD","FF","GSX","P","RF","TB1","TB2","TB3","TL","TP"], axis=1) dfhs #squeeze varibales dfhs1= dfhs.iloc[:1].squeeze() dfhs1 |
RR 0.0 SO 0.0 Name: 2015-01-01 00:00:00, dtype: float64
1 2 3 | #calculate mean through df p1= df.iloc[:6,[0,1,2,3,4,7,8,9,10,11]].mean() p1 |
DD 257.333333 FF 1.383333 GSX 0.000000 P 965.666667 RF 97.000000 TB1 3.200000 TB2 3.600000 TB3 5.200000 TL -2.533333 TP -3.066667 dtype: float64
1 2 3 | #calculate sum of RR and SO through df p2 = df.iloc[:6, [5,6]].sum() p2 |
RR 0.0 SO 0.0 dtype: float64
1 2 3 | #check if resampling and dataframe mean give the same values same_val= np.allclose(p1, dfh_1, rtol=1e-05, atol=1e-08) print('resampling- and dataframe mean calculation give the same value:',(same_val)) |
resampling- and dataframe mean calculation give the same value: True
1 2 3 | #check if resampling and dataframe sums give the same values same_val_sum= np.allclose(p2, dfhs1, rtol=1e-05, atol=1e-08) print('resampling- and dataframe sum calculation give the same value:',(same_val_sum)) |
resampling- and dataframe sum calculation give the same value: True
From now on, we will use the hourly data only (and further aggregations when necessary). The 10 mins data are great but require a little bit more of pandas kung fu (the chinese term, not the sport) to be used efficiently.
Spend some time exploring the dfh
dataframe we just created. What time period does it cover? What variables does it contain?
Note on pandas: all the exercises below can be done with or without pandas. Each question can be answered with very few lines of code (often one or two) with pandas, and I recommend to use it as much as possible. If you want, you can always use numpy in case of doubt: you can access the data as a numpy array with: df[column_name].values
.
03 - Precipitation¶
In this section, we will focus on precipitation only.
03-01: Compute the average annual precipitation (m/year) over the 7-year period.
1 2 3 4 5 6 7 8 | #calculate the sum and mean of RF import pandas as pd dfm= df.resample('M').mean() annual_rr_precipitation = dfm.drop(["DD","FF","GSX","P","RF","SO","TB1","TB2","TB3","TL","TP"], axis=1) annual_rr_precipitation |
RR | |
---|---|
time | |
2015-01-31 | 0.015950 |
2015-02-28 | 0.005928 |
2015-03-31 | 0.011380 |
2015-04-30 | 0.011944 |
2015-05-31 | 0.034073 |
... | ... |
2021-08-31 | 0.028875 |
2021-09-30 | 0.021764 |
2021-10-31 | 0.009252 |
2021-11-30 | 0.023866 |
2021-12-31 | 0.008042 |
84 rows × 1 columns
03-02: What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?
1 2 3 4 5 6 7 8 9 10 | #minimum of non zero precipiation and maxiumum hourly precipiation smallest_non_zero_precipitation = dfh[dfh["RR"] > 0].min() max_hourly_precipitation = dfh["RR"].max() #time time_max_hourly_precipitation = dfh["RR"].idxmax() time_min_hourly_precipitation = dfh["RR"][dfh["RR"]>0].idxmin() print("Smallest non zero precipitation: ",round(smallest_non_zero_precipitation, 3), "mm") print("Maximal hourly precipitation: ",round(max_hourly_precipitation,2), "mm") |
Smallest non zero precipitation: DD 18.667 FF 0.150 GSX 0.000 P 910.767 RF 29.333 RR 0.017 SO 0.000 TB1 0.300 TB2 0.900 TB3 1.500 TL -11.400 TP -14.350 dtype: float64 mm Maximal hourly precipitation: 3.7 mm
03-03: Plot a histogram of hourly precipitation, with bins of size 0.2 mm/h, starting at 0.1 mm/h and ending at 25 mm/h. Plot the same data, but this time with a logarithmic y-axis. Compute the 99th percentile (or quantile) of hourly precipitation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | #histogram linear plt.figure(figsize=(10,5)) plt.hist(dfh["RR"],bins=(125),range=(0.1, 25)) plt.xlabel("hourly precipitation (mm/h)") plt.ylabel("Frequency") plt.title("precipitation/hour") plt.grid plt.show #histogram logarithmic plt.figure(figsize=(10,5)) plt.hist(dfh["RR"], log=True, bins=(125),range=(0.1, 25)) plt.xlabel("hourly precipitation (mm/h)") plt.ylabel("Frequency") plt.title("precipitation/hour") plt.grid plt.show #compute 99th percentile of hourly precipitation percentile_99 = dfh["RR"].quantile(0.99) print("99th percentile of hourly precipitation: ",round(percentile_99,2),"mm/h") |
99th percentile of hourly precipitation: 0.39 mm/h
03-04: Compute daily sums (mm/d) of precipitation (tip: use .resample
again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).
1 2 3 4 5 6 7 8 9 | #daily sum of precipitation dfd_r= df.resample('D').sum() dfd_r= dfd_r.drop(["DD","FF","GSX","P","RF","SO","TB1","TB2","TB3","TL","TP"], axis=1) print((dfd_r), "mm/d") #average rainy days avg_dfd_r= (dfd_r >0.1).sum()/7 print((avg_dfd_r), "rainy days per year") |
RR time 2015-01-01 0.0 2015-01-02 0.6 2015-01-03 12.2 2015-01-04 7.7 2015-01-05 0.0 ... ... 2021-12-27 0.0 2021-12-28 0.0 2021-12-29 15.1 2021-12-30 2.1 2021-12-31 0.0 [2557 rows x 1 columns] mm/d RR 158.857143 dtype: float64 rainy days per year
03-05: Now select (subset) the daily dataframe to keep only only daily data in the months of December, January, February (DFJ). To do this, note that dfh.index.month
exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).
1 2 3 4 5 6 7 8 9 10 11 | #select DJF: December to February (12-2) djf = dfd_r[(dfd_r.index.month == 12)|(dfd_r.index.month <=2)] #computing the average precipitation of DJF average_precipitation_djf = djf.mean() #counting the day on which it rained. Divided by 7 (7years) aver_rainy_days_djf = (djf >= 0.1).sum()/7 print("Average precipitation in DJF: ",round(average_precipitation_djf,2),"mm/d") print("Average number of days where it rained in DJF: ",round(aver_rainy_days_djf)) |
Average precipitation in DJF: RR 1.77 dtype: float64 mm/d Average number of days where it rained in DJF: RR 38.0 dtype: float64
1 2 3 4 5 6 7 8 9 10 11 | #select JJA: June to August (12-2) jja = dfd_r[(dfd_r.index.month >= 6)&(dfd_r.index.month <=8)] #computing the average precipitation of JJA average_precipitation_jja = jja.mean() #counting the day on which it rained. Divided by 7 (7years) aver_rainy_days_jja = (jja >= 0.1).sum()/7 print("Average precipitation in JJA: ",round(average_precipitation_jja,2),"mm/d") print("Average number of days where it rained in JJA: ",round(aver_rainy_days_jja)) |
Average precipitation in JJA: RR 4.02 dtype: float64 mm/d Average number of days where it rained in JJA: RR 53.0 dtype: float64
03-06: Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.
1 2 3 4 5 6 | #subset for DJF & JJA djf = dfh[(dfh.index.month <= 2)|(dfh.index.month == 12)] jja = dfh[(dfh.index.month >= 6) & (dfh.index.month <= 8)] print("DJF hourly over 99th percentile: ",len(djf["RR"][djf["RR"]>percentile_99])) print("JJA hourly over 99th percentile: ",len(jja["RR"][jja["RR"]<percentile_99])) |
DJF hourly over 99th percentile: 68 JJA hourly over 99th percentile: 15148
03-07: Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA. I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour
exists and can be used to categorize data. Then, note that df.groupby
exists and can be used exactly for that (documentation).
1 2 3 4 5 6 7 8 9 10 11 12 13 | #compute the average daily cycles of DJF & JJA djf_daily_cycle = djf.groupby(djf.index.hour)["RR"].mean() jja_daily_cycle = jja.groupby(jja.index.hour)["RR"].mean() fig, ax = plt.subplots(figsize=(9,5)) ax.plot(djf_daily_cycle, label="DJF") ax.plot(jja_daily_cycle, label="JJA") ax.set_title("Average daily cycle of hourly precipitation") ax.set_xlabel("time of the day in h") ax.set_ylabel("Average hourly precipitation in mm/h") ax.legend() ax.grid() plt.show() |
04 - A few other variables¶
In this section, we will continue to analyze the weather station data.
04-01: Verify that the three soil temperatures have approximately the same average value over the entire period. Now plot the three soil temperature timeseries in hourly resolution over the course of the year of 2020 (example). Repeat the plot with the month of may 2020.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #04-01 #mean temperature of soils over whole time period TB1_avg= dfh['TB1'].mean() print('Soil 1 has an average mean temperature of:',(TB1_avg),'°C') TB2_avg= dfh['TB2'].mean() print('Soil 2 has an average mean temperature of:',(TB2_avg),'°C') TB3_avg= dfh['TB3'].mean() print('Soil 3 has an average mean temperature of:',(TB3_avg),'°C\n') #use allclose to show soils mean temperatures are approximately the same, with a tolerance of 0.35°C, over the time period. sim1= np.allclose(TB1_avg, TB2_avg, rtol=0.35, atol= 0.35) print('Soil 1 and 2 are approximately the same:', (sim1)) sim2=np.allclose(TB2_avg, TB3_avg, rtol=0.35, atol=0.35) print('Soil 2 and 3 are approximately the same:', (sim2)) sim3= np.allclose(TB1_avg, TB3_avg, rtol=0.35, atol=0.35) print('Soil 1 and 3 are approximately the same:', (sim3)) |
Soil 1 has an average mean temperature of: 11.393769492976766 °C Soil 2 has an average mean temperature of: 11.514406669059351 °C Soil 3 has an average mean temperature of: 11.379132772780602 °C Soil 1 and 2 are approximately the same: True Soil 2 and 3 are approximately the same: True Soil 1 and 3 are approximately the same: True
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #filter out data for year 2020 dfh_2020 = dfh.loc['2020-01-01':'2020-12-31'] #Drop columns all apart from TB1,TB2,TB3 from dataframe dfh_2020_soil = dfh_2020.drop(["DD","FF","GSX","P","RF","RR","SO","TL","TP"], axis=1) #plot functions for 2020 ax = dfh_2020_soil.plot(figsize=(8,5)) ax.set_xlabel('Time') ax.set_ylabel('Temperature °C') |
Text(0, 0.5, 'Temperature °C')
1 2 3 4 5 6 7 8 9 10 | #filter out data for month may in year 2020 dfh_2020_may = dfh.loc['2020-05-01':'2020-05-31'] #Drop columns apart from TB1,TB2,TB3 from dataframe dfh_2020_may_soil = dfh_2020_may.drop(["DD","FF","GSX","P","RF","RR","SO","TL","TP"], axis=1) #plot functions for may 2020 ax = dfh_2020_may_soil.plot(figsize=(8,5)) ax.set_xlabel('Time') ax.set_ylabel('Temperature °C') |
Text(0, 0.5, 'Temperature °C')
04-02: Plot the average daily cycle of all three soil temperatures.
1 2 3 4 5 6 7 8 9 10 11 12 | #04-02 #resmaple data for daily values and take mean dfh_davg= dfh.resample('D').mean() #drop columns apart from TB1,TB2,TB3 from Dataframe dfh_davg= dfh_davg.drop(["DD","FF","GSX","P","RF","RR","SO","TL","TP"], axis=1) #plot daily average of soils ax= dfh_davg.plot(figsize=(8,5)) ax.set_xlabel('Time') ax.set_ylabel('Temperature °C') |
Text(0, 0.5, 'Temperature °C')
04-03: Compute the difference (in °C) between the air temperature and the dewpoint temperature. Now plot this difference on a scatter plot (x-axis: relative humidity, y-axis: temperature difference).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #04-03 #create new column in dataframe for difference between air temperature and dew point temperature names "TL-TP" dfh_2=dfh dfh_2["TL-TP"] = dfh_2["TL"] - dfh_2["TP"] # define new dataframe function through dropping of columns for temperature difference and relative humidity dfh_tempdiff = dfh_2.drop(["DD","FF","GSX","P","RF","RR","SO","TB1", "TB2", "TB3","TL","TP"], axis=1) dfh_hum = dfh_2.drop(["DD","FF","GSX","P","RR","SO","TB1", "TB2", "TB3","TL","TP","TL-TP"], axis=1) #plot temperature difference against relative humidity fig, ax = plt.subplots(figsize=(8,5)) ax.scatter(dfh_hum, dfh_tempdiff,s=20, alpha = 0.5) ax.set_xlabel('relative humiditiy %') ax.set_ylabel('temperature difference °C') |
Text(0, 0.5, 'temperature difference °C')
05 - Free coding project¶
- This section should have at least 5 original plots in it. They are the output of your analysis.
- This section should also use additional data that you downloaded yourself. The easiest way would probably be to download another station(s) from the ZAMG database, or data from the same station but for another time period (e.g. for trend or change analysis). You can, however, decide to do something completely different if you prefer (as long as you download and read one more file).
- this section should contain at least one regression or correlation analysis between two parameters. Examples:
- between two different variables at the same station (like we did with the dewpoint above)
- between different stations (for example, average temperature as a function of station elevation)
- between average temperature and time (trends analysis)
- etc.
1 2 3 | import pandas as pd import numpy as np import matplotlib.pyplot as plt |
We have choosen to resample our following data in hourly means and sums since it was more consice to use on our first graph.
1 2 3 4 5 6 7 8 | #import st anton data and read/interpret kf = pd.read_csv('ZEHNMIN St anton.csv', index_col=0, parse_dates=True) #drop station column kf= kf.drop('station', axis=1) #resample data for hourly mean except for precipitation and sunshine kfh_m = kf.resample('H').mean() kfh_m = kfh_m.drop(['RR','SO','TL','TP'], axis=1) kfh_m |
P | RF | |
---|---|---|
time | ||
2015-01-01 00:00:00+00:00 | 882.316667 | 85.333333 |
2015-01-01 01:00:00+00:00 | 882.600000 | 84.333333 |
2015-01-01 02:00:00+00:00 | 882.716667 | 87.833333 |
2015-01-01 03:00:00+00:00 | 882.800000 | 88.833333 |
2015-01-01 04:00:00+00:00 | 882.750000 | 89.166667 |
... | ... | ... |
2023-06-08 19:00:00+00:00 | 870.500000 | 88.833333 |
2023-06-08 20:00:00+00:00 | 870.966667 | 89.166667 |
2023-06-08 21:00:00+00:00 | 871.116667 | 90.166667 |
2023-06-08 22:00:00+00:00 | 871.133333 | 91.166667 |
2023-06-08 23:00:00+00:00 | 871.050000 | 91.000000 |
73944 rows × 2 columns
Above you can see the hourly means of Pressure and Relative humidity in St Anton
1 2 3 4 5 6 7 8 | #import galzig data and read/interpret ef = pd.read_csv('ZEHNMIN Galzig.csv', index_col=0, parse_dates=True) #drop station column ef= ef.drop('station', axis=1) #resample data for hourly mean except for precipitation and sunshine efh_m= ef.resample('H').mean() efh_m = efh_m.drop(['RR','SO','TL','TP'], axis=1) efh_m |
P | RF | |
---|---|---|
time | ||
2015-01-01 00:00:00+00:00 | 797.133333 | 77.833333 |
2015-01-01 01:00:00+00:00 | 797.616667 | 76.666667 |
2015-01-01 02:00:00+00:00 | 797.750000 | 78.666667 |
2015-01-01 03:00:00+00:00 | 797.766667 | 81.166667 |
2015-01-01 04:00:00+00:00 | 797.800000 | 82.500000 |
... | ... | ... |
2023-06-08 19:00:00+00:00 | 791.716667 | 67.000000 |
2023-06-08 20:00:00+00:00 | 792.083333 | 63.833333 |
2023-06-08 21:00:00+00:00 | 792.050000 | 67.000000 |
2023-06-08 22:00:00+00:00 | 792.000000 | 70.166667 |
2023-06-08 23:00:00+00:00 | 791.683333 | 72.166667 |
73944 rows × 2 columns
Above you can see the hourly mean of pressure and relative humidity in Galzig
1 2 3 4 | #resample St anton data for hourly sum for precipitation and sunshine kfh_s = kf.resample('H').sum() kfh_s = kfh_s.drop(['P','RF','TL','TP','SO'], axis=1) kfh_s |
RR | |
---|---|
time | |
2015-01-01 00:00:00+00:00 | 0.0 |
2015-01-01 01:00:00+00:00 | 0.0 |
2015-01-01 02:00:00+00:00 | 0.0 |
2015-01-01 03:00:00+00:00 | 0.0 |
2015-01-01 04:00:00+00:00 | 0.0 |
... | ... |
2023-06-08 19:00:00+00:00 | 0.0 |
2023-06-08 20:00:00+00:00 | 0.0 |
2023-06-08 21:00:00+00:00 | 0.0 |
2023-06-08 22:00:00+00:00 | 0.0 |
2023-06-08 23:00:00+00:00 | 0.0 |
73944 rows × 1 columns
Above is the sum of precipitation per hour in St Anton
1 2 3 4 | #resample Galzig data for hourly sum for precipitation and sunshine efh_s = ef.resample('H').sum() efh_s = efh_s.drop(['P','RF','TL','TP','SO'], axis=1) efh_s |
RR | |
---|---|
time | |
2015-01-01 00:00:00+00:00 | 0.0 |
2015-01-01 01:00:00+00:00 | 0.0 |
2015-01-01 02:00:00+00:00 | 0.0 |
2015-01-01 03:00:00+00:00 | 0.0 |
2015-01-01 04:00:00+00:00 | 0.0 |
... | ... |
2023-06-08 19:00:00+00:00 | 0.0 |
2023-06-08 20:00:00+00:00 | 0.0 |
2023-06-08 21:00:00+00:00 | 0.0 |
2023-06-08 22:00:00+00:00 | 0.0 |
2023-06-08 23:00:00+00:00 | 0.0 |
73944 rows × 1 columns
Above is the sum of precipitation per hour in Galzig
First we would to compare the precipitation values between the two stations, we will do this graphically
1 2 3 4 5 6 7 8 | fig, ax = plt.subplots(figsize=(16,9)) ax.plot(kfh_s, label='st anton') ax.plot(efh_s, label='galzig') ax.legend() ax.set_xlabel('time') ax.set_ylabel('precipitation mm/H') ax.set_title('comparison of hourly precipitation between st anton and galzig') |
Text(0.5, 1.0, 'comparison of hourly precipitation between st anton and galzig')
Here we can see a very clear maximum in the St Anton precipitation, and we would to investigate this more closley, mainly because in Galzig which is only 1.23km away, the precipitation was much much less. We will calculate the exact values as followed:
1 2 | #maximum of hour of precipitation for st anton kfh_s.idxmax() |
RR 2018-08-01 16:00:00+00:00 dtype: datetime64[ns, UTC]
1 2 | #how much precipitation there was this hour kfh_s.loc['2018-08-01 16:00'] |
RR 27.4 Name: 2018-08-01 16:00:00+00:00, dtype: float64
1 2 | #the precipitation in Galzig during St Anton maximum precipitation (mm/H) print('Precipitation in Galzig when St Anton had its maximum:',efh_s.loc['2018-08-01 16:00']) |
Precipitation in Galzig when St Anton had its maximum: RR 10.5 Name: 2018-08-01 16:00:00+00:00, dtype: float64
1 2 3 | #difference in precipitation (mm/H) kfh_s.loc['2018-08-01 16:00']-efh_s.loc['2018-08-01 16:00'] |
RR 16.9 Name: 2018-08-01 16:00:00+00:00, dtype: float64
We will now investigate the other climate variables in both stations to see what may have caused st anton to have 16.9 mm/H more rainfall than Galzig
Precipitation in the hour¶
We would like to get a more localised graph for precipitation during this hour, so we can see exactly how the volume of precipitation changed over time, and for this we will use our 10 minute mean data.
1 2 3 | kf_08_01= kf.loc['2018-08-01 16:00':'2018-08-01 16:50'] kf_08_01=kf_08_01.drop(['SO','TL','TP'], axis=1) kf_08_01 |
P | RF | RR | |
---|---|---|---|
time | |||
2018-08-01 16:00:00+00:00 | 875.7 | 82.0 | 0.9 |
2018-08-01 16:10:00+00:00 | 876.8 | 82.0 | 4.0 |
2018-08-01 16:20:00+00:00 | 877.7 | 91.0 | 10.1 |
2018-08-01 16:30:00+00:00 | 877.4 | 90.0 | 7.8 |
2018-08-01 16:40:00+00:00 | 877.4 | 92.0 | 3.7 |
2018-08-01 16:50:00+00:00 | 877.7 | 91.0 | 0.9 |
1 2 3 | ef_08_01= ef.loc['2018-08-01 16:00':'2018-08-01 16:50'] ef_08_01= ef_08_01.drop(['SO','TL','TP'], axis=1) ef_08_01 |
P | RF | RR | |
---|---|---|---|
time | |||
2018-08-01 16:00:00+00:00 | 798.9 | 83.0 | 1.8 |
2018-08-01 16:10:00+00:00 | 799.5 | 93.0 | 1.0 |
2018-08-01 16:20:00+00:00 | 799.7 | 88.0 | 5.6 |
2018-08-01 16:30:00+00:00 | 799.6 | 86.0 | 0.9 |
2018-08-01 16:40:00+00:00 | 799.3 | 82.0 | 0.8 |
2018-08-01 16:50:00+00:00 | 799.6 | 85.0 | 0.4 |
1 2 3 4 5 6 7 8 9 10 11 | #plot a localised graph of precipitation suring 16:00-17:00 on 2018-08-01 efrr_08_01= ef_08_01.drop(['RF','P'], axis=1) kfrr_08_01= kf_08_01.drop(['RF','P'], axis=1) fig, rx = plt.subplots() rx.plot(kfrr_08_01, label='St anton') rx.plot(efrr_08_01, label='Galzig') rx.set_title('comparison of precipiation in mm every 10 minutes in the hour 16:00-17:00 on 2018-08-01') rx.legend() rx.set_xlabel('time') rx.set_ylabel('Precipitation (mm/10 minutes)') |
Text(0, 0.5, 'Precipitation (mm/10 minutes)')
1 | kf_08_01['RR'].loc['2018-08-01 16:20']-ef_08_01['RR'].loc['2018-08-01 16:20'] |
4.5
This graph gives us a much more clear overview into how precipitation changed over time in both station with a very clear maximum in both at 16:20, but with a difference of 4.5mm/10 minutes.
To see why this may have occured we will continue to investigate our other variables, Relative humidity and Pressure
Pressure¶
Next we will investigate pressure graphically with our 10 minutes data in the same time period ("16:00 - 17:00" 2018-08-01)
1 2 3 4 5 6 7 8 9 10 11 12 | #the change the pressure in the hour 16:00-17:00 on the 2018-08-01 in both st anton and galzig kfp_08_01= kf_08_01.drop(['RF','RR'], axis=1) efp_08_01= ef_08_01.drop(['RF','RR'], axis=1) fig, px = plt.subplots() px.plot(kfp_08_01, label='St Anton') px.plot(efp_08_01, label='Galzig') px.legend() px.set_xlabel('time') px.set_ylabel('Pressure (hPa)') |
Text(0, 0.5, 'Pressure (hPa)')
Checking the change in the pressure at this point is 1) to time localised to see a major change of pressure during the 60 minutes, and 2) since the altitude is fixed at both stations, pressure will not have a large effect on the massive influx of precipitation, so pressure is not a good climate variable to consider.
But to have a closer look we will compare precipitation with time and pressure with time just in St Anton and calculate the correlation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #plot a graph of precipitation and relative humidity against during the time interval ('16:00-17:00' 2018-08-01) fig, lx = plt.subplots(figsize=(16,9)) lx.plot(kfrr_08_01, label='St anton precipitation') lx1 = lx.twinx() lx1.plot(kfp_08_01, label='St anton pressure', color='green') lx.legend(loc='upper left') lx1.legend(loc='upper right') lx.set_title('How precipitation changes with pressure') lx.set_xlabel('time') lx1.set_ylabel('pressure (hPa)') lx.set_ylabel('Precipitaion (mm/10 minutes)') |
Text(0, 0.5, 'Precipitaion (mm/10 minutes)')
In the graph we are able to see a simultaneous increase for both variables until the maximum point at 16:20, whereafter precipitation makes a huge drop and pressure continues to stay relatively constant, making minor changes in the last 30 minutes of the hour.
To check this relationship we will calculate the correlation:
1 2 3 | #calculate the correlation between pressure and precipitation corr_1= kf_08_01['P'].corr(kf_08_01['RR']) print(f'{corr_1:.5f}') |
0.50001
The correlation is high since it lies in the bracket ±0.50 - ±1, and is positive So although it may not have seemed in the first graph that pressure changed signifcantly, it does have a large correlation with precipitation, and therefore impacts one another.
Humidity¶
Next we would like to investigate the Relative Humidity in the time period ("16:00 - 17:00" 2018-08-01), and we will once again do this graphically in order to have a better visual
1 2 3 4 5 6 7 8 9 10 | #plot relative humidity against time in both station during time interval ('16:00-17:00' 2018-08-01) efh_08_01 = ef_08_01.drop(['P','RR'], axis=1) kfh_08_01 = kf_08_01.drop(['P','RR'], axis=1) fig, hx = plt.subplots() hx.plot(kfh_08_01, label='St anton') hx.plot(efh_08_01, label='Galzig') hx.legend() hx.set_xlabel('Time') hx.set_ylabel('relative humidity (%)') |
Text(0, 0.5, 'relative humidity (%)')
Here we can see a large influx of relative humidity between 16:10 and 16:20 for St Anton, and a large drop in relative humidity in Galzig between 16:10- 16:40.
We would like to investigate the correlation between relative humidity and precipitation deeper in both stations:
1 2 3 4 5 6 7 8 9 10 11 12 13 | #plot a graph of of precipitation against relative humidity in time interval ('16:00-17:00' 2018-08-01) fig, jx = plt.subplots(figsize=(16,9)) jx.plot(kfrr_08_01, label='St anton precipitation') jx1 = jx.twinx() jx1.plot(kfh_08_01, label='St anton humidity', color='green') jx.legend(loc='upper left') jx1.legend() jx.set_title('How precipitation changes with relative humidity') jx.set_xlabel('time') jx1.set_ylabel('Relative humidty (%)') jx.set_ylabel('Precipitaion (mm/10 minutes)') |
Text(0, 0.5, 'Precipitaion (mm/10 minutes)')
In this direct comparison we can see that during the time interval 16:00 - 16:20 humidity and precipitation increase at similar rate, but after 16:20 precipitation (mm/10 minutes) decreases back to a low of 0.9mm/10 minutes whereas humidity has a slight increase again.
1 2 3 | #calculate the correlation between relative humidity and precipitation corr= kf_08_01['RF'].corr(kf_08_01['RR']) print(f'{corr:.5f}') |
0.39017
Here we can see that there is a medium correlation between the two factors since it lies between ±0.30 and ±0.49. Therefore we can conclude that the two variables, relative humidity and precipitation, are linked and impact one another.
Overall¶
Through this investigation we have found that Relative Humidity and Pressure do infact impact Precipitation and there is a medium and high correlation between the two variables and Precipitation respectively.