Programming project¶
Preamble¶
- Names: Daniel Steinle
- Matrikelnummer: 12206327
DISCLAIMER:
I initialy wanted to make this group project with Felix Gruber, and did the full 01 to 04 part (was not a good idea). After reading in the data for him, we agreed on him starting to examine the data. We wanted to meet on Tuesday but he told me he couldn't come. When asking him after some days on last Friday how far he's come, I received no answer until Saturday evening, him saying he hasn't done anything so far.
Since at that time I arleady started to do the project part myself to the fullest as well, I decided to upload my programming project on my one, since I have already done everything of it and find it unfair to gift it to someone who did not show any initiative themselves to contribute to the project.
I'm unsure if he'll upload my solution from part 01-04 as his own now though. The state of the read in files included 500 days data of the weather stations of Augsburg and Kempten.
I know that this supposed to be a group project and my uploading may not suffice this criteria. Tough I can not upload this for someone who did not participated in it at all.
01 - An energy balance model with hysteresis¶
The planetary albedo $\alpha$ is in fact changing with climate change. As the temperature drops, sea-ice and ice sheets are extending (increasing the albedo). Inversely, the albedo decreases as temperature rises. The planetary albedo of our simple energy balance model follows the following equation:
$$ \alpha = \begin{cases} 0.3,& \text{if } T \gt 280\\ 0.7,& \text{if } T \lt 250\\ a T + b, & \text{otherwise} \end{cases} $$
***01-01**: Compute the parameters $a$ and $b$ so that the equation is continuous at T=250K and T=280K.*¶
To evaluate $a$ and $b$ one has to solve a system consisting of two linear equations.
$$ a \ (250) = 0.7 \\ a \ (280) = 0.3 $$
After simple calculations a and b can be determined as
$$ a = - {1 \over 75} \\ b = {121 \over 30} $$
This parametrisation results in the following straight:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | import numpy as np import matplotlib.pyplot as plt # Configuring the data: a = - 1 / 75 b = 121 / 30 x = np.linspace(250, 280, 20) y = a * x + b # Creating the Plot: fig, ax = plt.subplots() ax.vlines(250, .27, .72, color="lightgrey") ax.vlines(280, .27, .72, color="lightgrey") ax.plot([230, 250, 265 , 280, 300], [.7, .7, np.nan, .3, .3], label="constant albedo") ax.plot(x, a * x + b, color="red", label="linear connection") # Some Plotting beautifying ax.set_ylim(.27, .72) ax.set_xlabel("Temperature (K)") ax.set_ylabel("Albedo") ax.set_title("Temperature dependence of earth's albedo") ax.legend(); |
***01-02**: Now write a function called alpha_from_temperature
which accepts a single positional parameter T
as input (a scalar) and returns the corresponding albedo. Test your function using doctests to make sure that it complies to the instructions above.*¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | def alpha_from_temperature(t): """ Approximates earth's albedo for a given temperature in K Parameters ---------- t : array-like temperature in Kelvin Returns ------- albedo : ndarray earth's albedo (untítless) Examples -------- >>> alpha_from_temperature(250) array(0.7) >>> alpha_from_temperature(300) array(0.3) >>> from numpy.testing import assert_allclose >>> albedo = alpha_from_temperature(np.arange(250, 290, 10)) >>> assert_allclose(albedo, np.array([0.7, 0.56666667, 0.43333333, 0.3]), rtol=0, atol=1e-7) """ a = - 1 / 75 b = 121 / 30 albedo = np.where(t < 250, .7, np.where(t > 280, .3, a * t + b)) return albedo |
1 2 | import doctest doctest.testmod() |
TestResults(failed=0, attempted=5)
***01-03**: Adapt the existing code from week 07 to write a function called temperature_change_with_hysteresis
which accepts t0
(the starting temperature in K), n_years
(the number of simulation years) as positional arguments and tau
(the atmosphere transmissivity) as keyword argument (default value 0.611).* Verify that:¶
- the stabilization temperature with
t0 = 292
and default tau is approximately 288K - the stabilization temperature with
t0 = 265
and default tau is approximately 233K
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | # From the seventh assignment - just cutted the doctest so the code get's more lucid def asr(alpha=.3): """Absorbed Solar Radiation (W m-2). Parameters ---------- alpha : float, optional the planetary albedo Returns ------- the asr (float) """ s0 = 1362 return (1 - alpha) * s0 / 4 def olr(t, tau=.611): """Outgoing Longwave Radiation (W m-2). Parameters ---------- t : float the atmosphere temperature (K) tau : float, optional the atmosphere transmissivity (-) Returns ------- the olr (float) """ sigma = 5.67e-8 return sigma * tau * t ** 4 # From here on, this was adapted from the original temperature_change function def temperature_change_with_hysteresis(t0, n_years, tau=.611): """Temperature change scenario after change of transmissivity. Parameters ---------- t0 : float the starting temperature (K) n_years : int the number of simulation years tau : float, optional the atmosphere transmissivity (-) Returns ------- temperature : ndarray of size n_years + 1 """ # Standard Variables c = 4.0e+08 dt = 60 * 60 * 24 * 365 # Series computation temperature = np.zeros(n_years + 1) temperature[0] = t0 for i in range(n_years): temperature[i + 1] = temperature[i] + dt / c * (asr(alpha=alpha_from_temperature(temperature[i])) - olr(temperature[i], tau=tau)) return temperature |
1 2 | print("For t0 = 292K, does it converge to 288K? ", np.isclose(temperature_change_with_hysteresis(292, 20)[20], 288, rtol=0, atol=.2)) print("For t0 = 265K, does it converge to 233K? ", np.isclose(temperature_change_with_hysteresis(265, 60)[60], 233, rtol=0, atol=.2)) |
For t0 = 292K, does it converge to 288K? True For t0 = 265K, does it converge to 233K? True
***01-04**: Realize a total of N simulations with starting temperatures regularly spaced between t0
=206K, and t0
=318K and plot them on a single plot for n_years
=50. The plot should look somewhat similar to this example for N=21
.*¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | import pandas as pd def temp_change_sim(t0=206, t1=318, n_sim=21, n_years=50, coloured=False): """ Plots the temperature change scenarios for a given number of simumlations between a certain temperature range. Does not return anything! Parameters ---------- t0 : float (optional) Start of the starting temperature range (K) t1 : float (optional) End of the starting temperature range in (K) n_sim : int (optional) the number of simulations to be run n_years : int (optional) number of years the simulation should run coloured : boolean (optional) Wheter to use a colormap """ t_start = np.linspace(t0, t1 + 1, n_sim) years = np.arange(0, n_years + 1, 1) df = pd.DataFrame(index=years) for t in t_start: values = temperature_change_with_hysteresis(t, n_years) df[t] = values # Plotting (with colour-map and coloured) if coloured == True: df.plot(figsize=(8, 5), cmap=plt.get_cmap("coolwarm"), grid=True, xlim=(0, n_years + 2), legend=None, xlabel="Years", ylabel="Temperature (K)", title="Climate change scenario with hysteresis") else: df.plot(figsize=(8, 5), color="black", grid=True, xlim=(0, n_years + 2), legend=None, xlabel="Years", ylabel="Temperature (K)", title="Climate change scenario with hysteresis") return None |
1 | temp_change_sim() |
***Bonus**: only if you want (and if time permits), you can try to increase N and add colors to your plot to create a graph similar to this one.*
1 2 | # Looks similar (this colourmap was just too daring for a temperature scenario...) temp_change_sim(n_sim=100, coloured=True) |
02 - Weather station data files¶
I downloaded 10 min data from the recently launched ZAMG data hub. The data file contains selected parameters from the "INNSBRUCK-FLUGPLATZ (ID: 11804)" weather station.
You can download the data from the following links (right-click + "Save as..."):
- station data
- parameter metadata
- station list from the ZAMG (in a better format than last time)
Let me open the data for you and display its content:
1 2 3 | df = pd.read_csv('INNSBRUCK-FLUGPLATZ_Datensatz_20150101_20211231.csv', index_col=1, parse_dates=True) df = df.drop('station', axis=1) df |
DD | FF | GSX | P | RF | RR | SO | TB1 | TB2 | TB3 | TL | TP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
time | ||||||||||||
2015-01-01 00:00:00 | 232.0 | 1.0 | 0.0 | 965.7 | 96.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.7 | -3.4 |
2015-01-01 00:10:00 | 248.0 | 1.6 | 0.0 | 965.6 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.4 | -2.9 |
2015-01-01 00:20:00 | 252.0 | 1.6 | 0.0 | 965.6 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.4 | -2.9 |
2015-01-01 00:30:00 | 267.0 | 1.3 | 0.0 | 965.7 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.7 | -3.2 |
2015-01-01 00:40:00 | 279.0 | 1.4 | 0.0 | 965.7 | 97.0 | 0.0 | 0.0 | 3.2 | 3.6 | 5.2 | -2.5 | -3.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-12-31 23:10:00 | 257.0 | 1.1 | 0.0 | 959.8 | 98.0 | 0.0 | 0.0 | 3.1 | 3.4 | 3.4 | -1.7 | -1.7 |
2021-12-31 23:20:00 | 106.0 | 0.3 | 0.0 | 960.0 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -2.0 | -2.0 |
2021-12-31 23:30:00 | 141.0 | 0.5 | 0.0 | 960.2 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -1.7 | -1.7 |
2021-12-31 23:40:00 | 167.0 | 0.5 | 0.0 | 960.2 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -1.6 | -1.6 |
2021-12-31 23:50:00 | 275.0 | 0.3 | 0.0 | 960.1 | 100.0 | 0.0 | 0.0 | 3.0 | 3.4 | 3.4 | -1.6 | -1.7 |
368208 rows × 12 columns
***02-01**: after reading the documentation of the respective functions (and maybe try a few things yourself), explain in plain sentences:*¶
- what am I asking pandas to do with the
index_col=1, parse_dates=True
keyword arguments? Why am I doing this? - what am I asking pandas to do with
.drop()
? Whyaxis=1
?
ndex_col=1
: Pandas uses the column on position 1 for indexing (here it is the time)
parse_dates=True
: Allowes Pandas to analyze the index and transform them into datetime objects (use dayfirst=True for international time standard.)
default: index_col=0
df.drop()
: Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names
axis=0
: drop/delete one of the index (the specified),axis=1
: drop/delete one of the specified columns
Now let me do something else from you:
1 2 | dfmeta = pd.read_csv('ZEHNMIN Parameter-Metadaten.csv', index_col=0) dfmeta.loc[df.columns] |
Kurzbeschreibung | Beschreibung | Einheit | |
---|---|---|---|
DD | Windrichtung | Windrichtung, vektorielles Mittel über 10 Minuten | ° |
FF | vektorielle Windgeschwindigkeit | Windgeschwindigkeit, vektorielles Mittel über ... | m/s |
GSX | Globalstrahlung | Globalstrahlung, arithmetisches Mittel über 10... | W/m² |
P | Luftdruck | Luftdruck, Basiswert zur Minute10 | hPa |
RF | Relative Feuchte | Relative Luftfeuchte, Basiswert zur Minute10 | % |
RR | Niederschlag | 10 Minuten Summe des Niederschlags, Summe der ... | mm |
SO | Sonnenscheindauer | Sonnenscheindauer, Sekundensumme über 10 Minuten | s |
TB1 | Erdbodentemperatur in 10cm Tiefe | Erdbodentemperatur in 10cm Tiefe, Basiswert zu... | °C |
TB2 | Erdbodentemperatur in 20cm Tiefe | Erdbodentemperatur in 20cm Tiefe, Basiswert zu... | °C |
TB3 | Erdbodentemperatur in 50cm Tiefe | Erdbodentemperatur in 50cm Tiefe, Basiswert zu... | °C |
TL | Lufttemperatur in 2m | Lufttemperatur in 2m Höhe, Basiswert zur Minute10 | °C |
TP | Taupunkt | Taupunktstemperatur, Basiswert zur Minute10 | °C |
***02-02**: again, explain in plain sentences what the dfmeta.loc[df.columns]
is doing, and why it works that way.*¶
dfmeta.loc
: Access a group of rows and columns by label(s)
df.columns
: Specifies to access all columns of the DataFrame df. So just the in df "used" columns are specified
Alltogether, this results in only the in df used columns will be copied and then printed from dfmeta
Finally, let me do a last step for you before you start coding:
1 2 | dfh = df.resample('H').mean() # dfh |
***02-03**: explore the dfh
dataframe. Explain, in plain words, what the purpose of .resample('H')
followed by mean()
is. Explain what .resample('H').max()
and .resample('H').sum()
would do.*¶
.resample("H")
: Groups the DataFrame by hourly data. In this case, df.resample("H").mean()
is going to only take the mean values within every hour for all columns and will set that as one value per hour.
(in datatime format, "H" stands for hour. Usually for smaller units less than a day (hours, minutes, seconds, a large letter is used.))
It makes only sense to use this method in combination with .mean()
, .sum()
or something similar in order to state how the data should be merged into the hourly value. e. g. the sum of one hour or the average value in one hour, etc.
.resample('H').max()
: This would take the maximum of the 10min data within every hour, while
.resample('H').sum()
: would take the sum of the 10min data within one hour and take this as its hourly value.
***02-04**: Using np.allclose
, make sure that the average of the first hour (that you'll compute yourself from df
) is indeed equal to the first row of dfh
.*¶
1 2 3 4 | if np.allclose(dfh.iloc[0].to_numpy(), df.iloc[:6].mean()) is True: print("The dataframe 'dfh' is indeed identical to the manual computed average.") # USING df.iloc[3].to_numpy(), the first row is converted into an numpy array. TRANSFORM THE EX1 TO USE DATAFRAMES EARLIER! |
The dataframe 'dfh' is indeed identical to the manual computed average.
Now, two variables in the dataframe have units that aren't suitable for averaging. Please convert the following variables to the correct units:
RR
needs to be converted from the average of 10 min sums to mm/hSO
needs to be converted from the average of 10 min sums to s/h
1 2 3 4 5 | # Split the DataFram into 3 DataFrames either using sum or average and merge them back together # This is a design choice for the columns to stay in their current order. dfh = pd.concat([df.resample('H').mean()[['DD', 'FF', 'GSX', 'P', 'RF']], df[['RR', 'SO']].resample('H').sum(), df.resample('H').mean()[['TB1', 'TB2', 'TB3', 'TL', 'TP']]], axis = 1) dfh |
DD | FF | GSX | P | RF | RR | SO | TB1 | TB2 | TB3 | TL | TP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
time | ||||||||||||
2015-01-01 00:00:00 | 257.333333 | 1.383333 | 0.0 | 965.666667 | 97.000000 | 0.0 | 0.0 | 3.200000 | 3.600000 | 5.2 | -2.533333 | -3.066667 |
2015-01-01 01:00:00 | 160.666667 | 0.766667 | 0.0 | 965.966667 | 97.666667 | 0.0 | 0.0 | 3.150000 | 3.600000 | 5.2 | -3.250000 | -3.716667 |
2015-01-01 02:00:00 | 227.000000 | 0.816667 | 0.0 | 966.316667 | 98.500000 | 0.0 | 0.0 | 3.100000 | 3.600000 | 5.2 | -3.583333 | -3.933333 |
2015-01-01 03:00:00 | 258.833333 | 1.933333 | 0.0 | 966.466667 | 97.000000 | 0.0 | 0.0 | 3.100000 | 3.600000 | 5.2 | -3.250000 | -3.766667 |
2015-01-01 04:00:00 | 241.500000 | 1.266667 | 0.0 | 966.416667 | 95.166667 | 0.0 | 0.0 | 3.100000 | 3.600000 | 5.2 | -3.116667 | -3.883333 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-12-31 19:00:00 | 283.500000 | 1.166667 | 0.0 | 957.766667 | 98.833333 | 0.0 | 0.0 | 3.316667 | 3.300000 | 3.4 | 0.900000 | 0.716667 |
2021-12-31 20:00:00 | 237.333333 | 0.716667 | 0.0 | 958.100000 | 99.500000 | 0.0 | 0.0 | 3.283333 | 3.300000 | 3.4 | -0.200000 | -0.350000 |
2021-12-31 21:00:00 | 149.833333 | 1.083333 | 0.0 | 958.783333 | 99.666667 | 0.0 | 0.0 | 3.200000 | 3.300000 | 3.4 | -0.800000 | -0.916667 |
2021-12-31 22:00:00 | 175.833333 | 0.516667 | 0.0 | 959.433333 | 99.166667 | 0.0 | 0.0 | 3.100000 | 3.383333 | 3.4 | -1.083333 | -1.200000 |
2021-12-31 23:00:00 | 198.833333 | 0.600000 | 0.0 | 960.016667 | 99.500000 | 0.0 | 0.0 | 3.033333 | 3.400000 | 3.4 | -1.683333 | -1.716667 |
61368 rows × 12 columns
***From now on, we will use the hourly data only** (and further aggregations when necessary). The 10 mins data are great but require a little bit more of pandas kung fu (the chinese term, not the sport) to be used efficiently.*
Spend some time exploring the dfh
dataframe we just created. What time period does it cover? What variables does it contain?
1 | print(f"The data stored ranges from {dfh.index[0]} UTC until {dfh.index[-1]} UTC which is a time span of {dfh.index[-1] - dfh.index[0]} hours.") |
The data stored ranges from 2015-01-01 00:00:00 UTC until 2021-12-31 23:00:00 UTC which is a time span of 2556 days 23:00:00 hours.
Please note that the longer part of the description had to be cut out because it did not reflect the actual stored information anymore (hourly-data vs 10-min data)
The DataFrame contains the following variables:
1 | dfmeta.loc[dfh.columns].iloc[:, :1] |
Kurzbeschreibung | |
---|---|
DD | Windrichtung |
FF | vektorielle Windgeschwindigkeit |
GSX | Globalstrahlung |
P | Luftdruck |
RF | Relative Feuchte |
RR | Niederschlag |
SO | Sonnenscheindauer |
TB1 | Erdbodentemperatur in 10cm Tiefe |
TB2 | Erdbodentemperatur in 20cm Tiefe |
TB3 | Erdbodentemperatur in 50cm Tiefe |
TL | Lufttemperatur in 2m |
TP | Taupunkt |
03 - Precipitation¶
In this section, we will focus on precipitation only.
***03-01**: Compute the average annual precipitation (m/year) over the 7-year period.*¶
1 2 3 4 | annual_precip = pd.DataFrame() annual_precip["Average Precipitation"] = df["RR"].resample("7D").sum().resample("Y").mean() annual_precip.set_index(annual_precip.index.year) # As DataFrame for the looks! |
Average Precipitation | |
---|---|
time | |
2015 | 16.237736 |
2016 | 17.407692 |
2017 | 19.890385 |
2018 | 14.926923 |
2019 | 19.898077 |
2020 | 17.381132 |
2021 | 17.532692 |
***03-02**: What is the smallest non-zero precipitation measured at the station? What is the maximum hourly precipitation measured at the station? When did this occur?*¶
1 2 3 4 5 6 7 8 9 10 11 12 13 | # Some general data sorting: precip = pd.DataFrame() precip["RR"] = dfh["RR"] rr = dfh[precip > 0]["RR"] # Minimum Precipitation: min_rr = precip[precip == rr.min()] print(f"The smallest non-zero precipitation measured was: {rr.min()} mm/h, which happenend {min_rr.size} times, " f"first on {min_rr.index[0]} UTC and last on {min_rr.index[-1]} UTC") # Maximum Precipitation: max_rr = precip[precip == rr.max()] print(f"The greatest precipitation measured was: {rr.max()} mm/h, which happenend on {max_rr.index[0]}.") |
The smallest non-zero precipitation measured was: 0.1 mm/h, which happenend 61368 times, first on 2015-01-01 00:00:00 UTC and last on 2021-12-31 23:00:00 UTC The greatest precipitation measured was: 22.2 mm/h, which happenend on 2015-01-01 00:00:00.
1 2 3 4 | ax = rr.plot.hist(bins=125, range=(0,25), grid=True, xlim=(0, 25), xlabel="Precipitation (mm/h)", ylabel="Number of Occurence", title="Hourly precipitation in Innsbruck from 2015 to 2021"); # To control wheter the 'bins' value is set correctly: (the vertical line and the first bin size should match each other) # ax.vlines(0.2, 0, 1600, color="black"); |
Plot the same data, but this time with a logarithmic y-axis.
1 2 | ax = rr.plot.hist(bins=125, range=(0,25), grid=True, xlim=(0, 25), log=True, xlabel="Precipitation (mm/h)", ylabel="Number of Occurence", title="Hourly precipitation in Innsbruck from 2015 to 2021") |
Compute the 99th percentile (or quantile) of hourly precipitation.
1 2 | quantile = precip["RR"].quantile(q=.99) # stored for use in exercise below print(f"99% of the precipitation was less than {quantile:.2f} mm/h") |
99% of the precipitation was less than 2.33 mm/h
***03-04**: Compute daily sums (mm/d) of precipitation (tip: use .resample
again). Compute the average number or rain days per year in Innsbruck (a "rain day" is a day with at least 0.1 mm / d of measured precipitation).*¶
1 2 3 4 5 | # Daily sums (mm/day) rday = precip["RR"].resample("d").sum() # Computing the number of rain days: print(f"From 2015 until the end of 2021, Innsbruck had an average of {np.ceil(rday[rday > 0].resample('y').size().mean()):.0f} rain days per year.") |
From 2015 until the end of 2021, Innsbruck had an average of 171 rain days per year.
***03-05**: Now select (subset) the daily dataframe to keep only daily data in the months of December, January, February (DJF). To do this, note that dfh.index.month
exists and can be used to subset the data efficiently. Compute the average precipitation in DJF (mm / d), and the average number of rainy days in DJF. Repeat with the months of June, July, August (JJA).*¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | # The two data subsets to analyse: djf, jja = (12, 1, 2), (6, 7, 8) # A translation-dictionary in order to convert the column names properly translate = { 12 : "December", 1 : "January", 2 : "February", 6 : "June", 7 : "July", 8 : "August" } def season_subset(subset, hourly=False, adv=False): """ Subsets the precipitation data into the given month's (index). Parameters ---------- subset : array-like xy hourly : bool (optional) Wheter to subset the data with hourly values adv : bool (optional) Wheter to return the data into a single column Returns ------- dDJFf : pd.DataFrame The sliced data in Pandas DataFrame """ dfDJF = pd.DataFrame() if hourly is False: dfr = rday else: dfr = precip["RR"].resample("H").sum() if adv is False: for m in subset: # This ensures that columns of all length can be added to the DataFrame dfDJF = pd.concat([dfDJF, pd.DataFrame({translate[m] : dfr[dfr.index.month == m].values})], axis=1) return dfDJF else: # for a merged DataFrame (1 column) for m in subset: dfDJF = pd.concat([dfDJF, pd.DataFrame(dfr[dfr.index.month == m])], axis=0) return dfDJF def subset_stats(subset): """ Prints the mean daily precipitation and rain days in a given subset. Does not return anything. Paramters --------- subset : array-like The month indices for the data slicing """ df = season_subset(subset) print(f"The average daily precipitation was {df.mean().mean():.2f} mm/h and there were an average of {np.ceil((df[df > 0].count() / 7).mean()):.0f} rain days.") return None |
Now, the subset statistics of daily data for the months of December, January and February (DJF):
1 | subset_stats(djf) |
The average daily precipitation was 1.75 mm/h and there were an average of 13 rain days.
And - seperately, the subset statistics of daily data for the months of June, July and August (JJA):
1 | subset_stats(jja) |
The average daily precipitation was 4.02 mm/h and there were an average of 18 rain days.
***03-06**: Repeat the DJF and JJA subsetting, but this time with hourly data. Count the total number of times that hourly precipitation in DJF is above the 99th percentile computed in exercise 03-03. Repeat with JJA.*¶
1 2 3 4 5 6 7 8 9 10 11 12 | # The two seasons to analyse: season = (djf, jja) # Another translation-dictionary, again, for proper column name conversion tlsea = { (12, 1, 2) : "winter", (6, 7, 8) : "summer" } for s in season: subset_h = season_subset(s, hourly=True) print(f"In the {tlsea[s]} season, the 99% quantile was hit {subset_h[subset_h > quantile].count().sum()} times in the 7 years range.") |
In the winter season, the 99% quantile was hit 68 times in the 7 years range. In the summer season, the 99% quantile was hit 308 times in the 7 years range.
***03-07**: Compute and plot the average daily cycle of hourly precipitation in DFJ and JJA.*¶
I expect a plot similar to this example. To compute the daily cycle, I recommend to combine two very useful tools. First, start by noticing that ds.index.hour
exists and can be used to categorize data. Then, note that df.groupby
exists and can be used exactly for that (documentation).
1 2 3 4 5 6 7 8 | # Seasonal data: daily_avg = pd.DataFrame() for s in season: djf_hourly = season_subset(s, hourly=True, adv=True) daily_avg[tlsea[s]] = djf_hourly.groupby(djf_hourly.index.hour).mean() # Plotting: daily_avg.plot(grid=True, title="Precipitation daily cycle Innsbruck 2015-2021", xlabel="Hour of day (UTC)", ylabel="Hourly Precipitation (mm/h)"); |
04 - A few other variables¶
In this section, we will continue to analyze the weather station data.
***04-01**: Verify that the three soil temperatures have approximately the same average value over the entire period.*¶
1 2 3 4 5 6 | # Get all soil temperatures for easier use later on soil = dfh[["TB1", "TB2", "TB3"]] # prints if these 3 are nearly the same if np.allclose(soil.mean(), 11.4, rtol=0, atol=.12) is True: print("The average of the three soil temperatures in the whole time range is approxiamtely the same with 11.4°C ±0.12°C") |
The average of the three soil temperatures in the whole time range is approxiamtely the same with 11.4°C ±0.12°C
Now plot the three soil temperature timeseries in hourly resolution over the course of the year of 2020 (example). Repeat the plot with the month of may 2020.
1 2 | soil_hourly = soil.resample("h").mean() soil_hourly.loc["2020"].plot(grid=True, xlabel="", ylabel="Temperature (°C)", title="Soil temperature in 2020"); |
Now, only for May 2020:
1 | soil_hourly.loc["2020-05"].plot(grid=True, xlabel="", ylabel="Temperature (°C)", title="Soil temperature in May 2020"); |
***04-02**: Plot the average daily cycle of all three soil temperatures.*¶
1 2 | soil_hourly.groupby(soil_hourly.index.hour).mean().plot(grid=True, xlabel="Hour of day (UTC)", ylabel="Temperature (°C)", title="Average daily soil temperature in 2015-2021"); |
***04-03**: Compute the difference (in °C) between the air temperature and the dewpoint temperature. Now plot this difference on a scatter plot (x-axis: relative humidity, y-axis: temperature difference).*¶
1 2 3 4 5 6 7 8 9 | # The temperature difference diff = dfh["TL"] - dfh["TP"] # The plot: fig, ax = plt.subplots(figsize=(9, 5)) ax.scatter(dfh["RF"], diff, s=2, alpha=.2, color="darkcyan") ax.set_xlabel("Relative humidity (%)") ax.set_ylabel("Temperature difference (K)") ax.set_title("Temperature difference between air temperature and dew point temperature"); |
05 - Free coding project¶
05-00: Introduction¶
For the coding project I choose the three stations of Augsburg, Hohenpeißenberg and Seefeld in Tyrol. They lay in Bavaria and Tyrol and are 125km apart from each other (Augsburg-Seefeld).
The three stations form an imaginary line from the alpine foreland in Augsburg towards the mountainious Seefeld in Tyrol. My goal with this data analysis is to find possible correlations among these three. Because they are relatively close to each other, they are often affected by the same general weather situation which will be shown later on.
All map images for this project are from Openstreetmap. All Files and images can be found under their respective folders!
Since the three station share roughly the same longitude, (most) major synoptic events, are influenced by the general weather situation. This accounts for the time at when they occur as well - at least for the east-west direction as these station share about the same longitude.
Augsburg | Hohenpeißenberg | Seefeld | |
---|---|---|---|
Station location (marked orange) | |||
Scale | |||
Latitude (N) | 48° 25' | 47° 48' | 47.324722 |
Longitude (E) | 10° 56' | 11° 00' | 11.175555 |
Height above sea level | 462 m | 977 m | 1182 m |
In operation since | 1947 | 1781 | 1997 |
Operator | DWD | DWD | ZAMG - Geosphere Austria |
data from the DWD weather stations overview and the ZAMG station data.
05-01: Reading and preparing the data¶
1 2 3 4 | # Delete later on import numpy as np import pandas as pd import matplotlib.pyplot as plt |
Reading the data files which were split up by temperature, precipitation, wind and sun hours. The files contain data from the last 500 days from June 8th, 2023 backwards.
Since the station datas came from different service providers, they have small differences in their data format which was corrected before the implementation.
The data for the weather stations come from Geosphere Austria Data Hub for Seefeld in Tyrol and for Augsburg and Hohenpeißenberg from the DWD's Climate Data Center.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | # Preparing to read each station's data data = { "tu" : ["MESS_DATUM", "PP_10", "TT_10", "RF_10", "TD_10"], "ff" : ["MESS_DATUM", "FF_10", "DD_10"], "rr" : ["MESS_DATUM", "RWS_DAU_10", "RWS_10"], "sd" : ["MESS_DATUM", "GS_10", "SD_10"] } a, hpb, sf = pd.DataFrame(), pd.DataFrame(), pd.DataFrame() # Reading Augsburg's data (DWD) for d in data: values = pd.read_csv(f"station_data/produkt_zehn_min_{d}_20211206_20230608_00232.txt", sep=";", usecols=data[d], index_col="MESS_DATUM", parse_dates=True, na_values=-999) a = pd.concat([a, values], axis=1) # Reading Hohenpeißenberg's data (DWD) for d in data: values = pd.read_csv(f"station_data/produkt_zehn_min_{d}_20211206_20230608_02290.txt", sep=";", usecols=data[d], index_col="MESS_DATUM", parse_dates=True, na_values=-999) hpb = pd.concat([hpb, values], axis=1) del(values, data) # give free some no longer used storage # Transform J/cm² in W/m² for DWD stations for s in (a, hpb): s["GS_10"] = s["GS_10"] * 50 / 3 # Reading Seefeld's data (ZAMG) sf = pd.read_csv("station_data/ZEHNMIN Datensatz_20211206T0000_20230608T2350.csv", index_col=0, parse_dates=True).drop("station", axis=1) sf.index = pd.to_datetime(sf.index.strftime("%Y-%m-%d %H:%M:%S"), format="%Y-%m-%d %H:%M:%S") # For matching time format |
Because the station data came from different service providers, the data for incoming short-wave radiation and sunshine duration differs in format.
Quick Overview over the read data
Wind speed | Wind direction | Pressure | Relative Humidity | Precipitation | Min. of Precip. per 10min | Sunshine duration per 10min | Incoming short-wave radiation | Air temperature (2m) | Dew point temperature | |
---|---|---|---|---|---|---|---|---|---|---|
DWD Stations | FF_10 | DD_10 | PP_10 in (0, 360] | RF_10 | RWS_10 | RWS_DAU_10 | SD_10 in hours/10min | s.a. | TT_10 | TD_10 |
ZAMG Station | FF | DD | P in [0, 360] | RF | RR | RRM | SO in sec/10min | s.a. | TL | TP |
**s.a. : see above*
05-02: Temperature comparision of the last 500 days¶
1 2 3 4 | # Preparing the plotting - Temperature 7 days mean a_mon = a["TT_10"].resample("7d").mean() hpb_mon = hpb["TT_10"].resample("7d").mean() sf_mon = sf["TL"].resample("7d").mean() |
1 2 3 4 5 6 7 8 9 10 | # Plotting fig, ax = plt.subplots(figsize=(10, 5)) ax = a_mon.plot(color="green", ylabel="Temperature (°C)", title="7 days average temperature over last 500 days", label="Augsburg (462m)") hpb_mon.plot(ax=ax, color="blue", label="Hohenpeißenberg (977m)") sf_mon.plot(ax=ax, color="red", label="Seefeld (1182m)") # Gridding ax.grid(which="major", color="gray") ax.grid(which="minor", axis="both") ax.legend(); |
The weekly average among all three station followed nearly the same course over the last year, with Seefeld staying a few degrees below the other two. Hohenpeißenberg however, stays about a degree below Augsburg's curve. In order to confirm this, the mean temperature over the year 2022 is calculated. The edges (so December 2021 and 2023) are excluded in order not to mix the calculated average over 500 days span with the year average temperature.
1 2 3 4 | print(f"In 2022, the measured mean temerature in Augsburg was {a['TT_10'].loc['2022'].mean():.2f}°C, in Hohenpeißenberg it was " f"{hpb['TT_10'].loc['2022'].mean():.2f}°C while in Seefeld it was {sf['TL'].loc['2022'].mean():.2f}°C") print(f"For comparison, the mean temperature over the 500 days span was {a['TT_10'].mean():.2f}°C in Augsburg, " f"{hpb['TT_10'].mean():.2f}°C in Hohenpeißenberg and {sf['TL'].mean():.2f}°C in Seefeld.") |
In 2022, the measured mean temerature in Augsburg was 10.32°C, in Hohenpeißenberg it was 9.32°C while in Seefeld it was 6.53°C For comparison, the mean temperature over the 500 days span was 8.94°C in Augsburg, 7.69°C in Hohenpeißenberg and 4.97°C in Seefeld.
Besides, the graph suggests a minor weekly difference in the warmer months while in the winter, the temperature tends to drift further apart. For a preciser analysis however, a climate analysis over a few decades would be favourable. This exercise limits itself to the 500 days span though.
Alltogether, since all graphs round-about share the same curve, the local properties around the measurement sites only affect the weekly temperature on a minor scale, e.g. on a daily temperature scale and less. The weekly mean temperature is therefore more affected by the general weather situation which due to the sites close proximity, they occur around the same time.
Next up, a more detailed analysis over the temperature follows.
05-03: 10 min temperature data and effect of local differences on it¶
As announced, a full plot over the whole date range among the 10 min data (coloured) with its weekly trend line. This trend line is the same as in the plot above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | fig, (t1, t2, t3) = plt.subplots(1, 3, figsize=(16, 3)) a["TT_10"].plot(ax=t1, color="green", linewidth=1, ylabel="Temperature (°C)", title="Temperature over last 500 days in Augsburg") hpb["TT_10"].plot(ax=t2, color="blue", linewidth=1, ylabel="Temperature (°C)", title="Temperature over last 500 days in Hohenpeißenberg") sf["TL"].plot(ax=t3, color="red", linewidth=1, ylabel="Temperature (°C)", title="Temperature over last 500 days in Seefeld") # Temperature trend line a_mon.plot(ax=t1, color="black") hpb_mon.plot(ax=t2, color="black") sf_mon.plot(ax=t3, color="black"); # Grid: for axis in (t1, t2, t3): axis.set_ylim(-18, 37) axis.grid(which="major", color="gray") axis.grid(which="minor", axis="both") |
When following the curves a clear minimum temperature value around mid December falls out with a relatively strong positive temperature gradient. In order to investigate it further, the date range is sliced around December 2022.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | # Unfortunately it is quite hard to ease this up with for-loops - Issues with specifying the axis. fig, ((t1, p1), (t2, p2), (t3, p3)) = plt.subplots(3, 2, figsize=(16, 9), width_ratios=[4, 3]) # Right side - whole Dec 2022 a["TT_10"].loc["2022-12"].plot(ax=t1, color="green", linewidth=1, ylabel="Temperature (°C)", \ title="Temperature & dewpoint in Dec 2022 in Augsburg (462m)") hpb["TT_10"].loc["2022-12"].plot(ax=t2, color="blue", linewidth=1, ylabel="Temperature (°C)", \ title="Temperature & dewpoint in Dec 2022 in Hohenpeißenberg (977m)") sf["TL"].loc["2022-12"].plot(ax=t3, color="red", linewidth=1, ylabel="Temperature (°C)", \ title="Temperature & dewpoint in Dec 2022 in Seefeld (1182m)") a["TD_10"].loc["2022-12"].plot(ax=t1, color="green", linewidth=1, style=":", alpha=.6) hpb["TD_10"].loc["2022-12"].plot(ax=t2, color="blue", linewidth=1, style=":", alpha=.6) sf["TP"].loc["2022-12"].plot(ax=t3, color="red", linewidth=1, style=":", alpha=.6) # Left side - 17th Dec until 19th Dec a["TT_10"].loc["2022-12-17":"2022-12-19"].plot(ax=p1, color="green", linewidth=1, ylabel="Temperature (°C)", \ title="Temperature & dewpoint around 18th Dec in Augsburg (462m)") hpb["TT_10"].loc["2022-12-17":"2022-12-19"].plot(ax=p2, color="blue", linewidth=1, ylabel="Temperature (°C)", \ title="Temperature & dewpoint around 18th Dec in Hohenpeißenberg (977m)") sf["TL"].loc["2022-12-17":"2022-12-19"].plot(ax=p3, color="red", linewidth=1, ylabel="Temperature (°C)", \ title="Temperature & dewpoint around 18th Dec in Seefeld (1182m)") a["TD_10"].loc["2022-12-17":"2022-12-19"].plot(ax=p1, color="green", linewidth=1, style=":", alpha=.6) hpb["TD_10"].loc["2022-12-17":"2022-12-19"].plot(ax=p2, color="blue", linewidth=1, style=":", alpha=.6) sf["TP"].loc["2022-12-17":"2022-12-19"].plot(ax=p3, color="red", linewidth=1, style=":", alpha=.6) # Make the plots share the same y-range for axis in (t1, t2, t3): axis.set_ylim(-20, 20) for axis in (p1, p2, p3): axis.set_ylim(-17, 12) # Griding among all plots for axis in (t1, t2, t3, p1, p2, p3): axis.grid(which="major") axis.grid(which="minor", axis="both", color="lightgray") |
The data shows some discrepancies on Dec 18th when compared to each other. In Augsburg, the temperature doesn't change much during the time period, staying around -7°C. This suggests a continous cloud cover over the city - inhibiting a significant warming of the ground.
For Hohenpeißenberg, the temperature is increasing during the whole day. When looking at webcam images nearby the weather station in Hohenpeißenberg, a retracting inversion can be observed, with the station after the early morning being above the cloud layer. The cloud layer continous to cease during the night towards the 19th.
In Seefeld however, there is a large temperature difference over the course of the day, suggesting a rather sunny day where the temperature peaks at midday due to solar radiation and drops again due to cloud-free conditions allowing long-wave radiation to emitt into the atmosphere.
To further examine the local weather situation, the incoming short-wave radiation as well as the sunshine duration is consulted below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # PLotting SW data fig, ((ax1, ax2, ax3), (s1, s2, s3)) = plt.subplots(2, 3, figsize=(16, 6), height_ratios=[2, 1]) a["GS_10"].loc["2022-12-18"].plot(ax=ax1, color="gold", title="Augsburg on Dec 18th", ylabel="short-wave radation (W/m²)") hpb["GS_10"].loc["2022-12-18"].plot(ax=ax2, color="gold", title="Hohenpeißenberg on Dec 18th") sf["GSX"].loc["2022-12-18"].plot(ax=ax3, color="gold", title="Seefeld on Dec 18th") # PLotting sunshine data (a["SD_10"].loc["2022-12-18"] * 60).plot(ax=s1, ylabel="Sunshine duration (min/10min)") (hpb["SD_10"].loc["2022-12-18"] * 60).plot(ax=s2) (sf["SO"].loc["2022-12-18"] / 60).plot(ax=s3) s1.set_ylim(s2.get_ylim()); # Deco for axis in (ax1, ax2, ax3): axis.set_ylim(ax2.get_ylim()) axis.set_xlabel("") axis.grid(which="both") |
And as seen in the plots, in Augsburg there is little incoming short-wave radiation which could warm the air at ground level. The data further suggests that this is due to missing sunshine (sunshine duration is zero during the whole day) what would be caused by extensive cloud coverage. A short view on available webcam images from the 9 km distant "Rathausplatz" supports this.
For both Hohenpeißenberg and Seefeld, the SW data and the sunshine duration are incompatible with a covered sky, they rather indicate a day with minimal cloud coverage over the station. Interestingly, the sunshine in Seefeld start later than in Hohenpeißenberg what would be caused by a mountain obstructing the sunlight. The relatively sharp incline at sunrise compared to Hohenpeißenberg supports this.
Temperature increase in Seefeld was due to direct sun shine (from 9-14h), the sunshine duration data suggests that there were clear sky conditions in Seefeld which is an explanation for the following drop in temperature during nightly hours.
And because there are webcam images available for the sites, it is possible to visualize the sky and cloud conditons again. They support said weather conditions as well.
Augsburg-Rathausplatz | Hohenpeißenberg-Station | Seefeld Sports Arena | |
---|---|---|---|
Webcam image | |||
Source | Bergfex.at | foto-webcam.eu | seefeld.com |
Additional Information | Webcam is located 9 km away from station. The webcam at the airport does only show up-to-date images |
Weather station and Webcam are on a hill | Weather station and webcam are near the sports arena |
05-04: Solar cycle and pressure¶
1 2 3 4 | # Sunshine hours per day a_sol = a["SD_10"].resample("d").sum().resample("7d") # initial data: sunshine hours per 10min hpb_sol = hpb["SD_10"].resample("d").sum().resample("7d") # initial data: sunshine hours per 10min sf_sol = (sf["SO"].resample("d").sum() / 3600).resample("7d") # initial data: sunshine seconds per 10min |
1 2 3 4 5 6 7 8 9 10 11 | translate = { "Augsburg" : a_sol, "Hohenpeißenberg" : hpb_sol, "Seefeld" : sf_sol } # daily sunshine hours - weekly min/max sol = pd.DataFrame() for s in ("Augsburg", "Hohenpeißenberg", "Seefeld"): sol[s + " max"] = translate[s].max() sol[s + " min"] = translate[s].min() |
1 2 3 4 5 | # The plot with min/max sunshine hours sol.plot(figsize=(16, 5), color=["green", "green", "blue", "blue", "red", "red"], style=["-", ":", "-", ":", "-", ":"], \ grid=True, ylim=(0, 16.1), alpha=.7, ylabel="Sunshine duration (hours)", \ title="Average daily sunshine duration in hours per week in the last 500 days") \ .legend(loc="lower center", bbox_to_anchor=(0.5, -0.3), ncols=6); |
Each station's maximum sunshine hours graph follows the length of a day. For Seefeld the day length is about two hours shorter than the other stations'. Again, this is due to mountains blocking the direct sunlight at the location of the station. Therefore - the day length differs quite a lot depending on the exact location whe in a valley.
Pressure
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | # As function call for later use def plotpressure500(aug, hpberg, sfeld, info="Pressure"): """ Plots the Pressure for the stations 'Augsburg', 'Hohenpeißenberg' and 'Seefeld'. Does not return anything Paramters --------- aug : array-like Pressure data in Augsburg hpberg : array-like Pressure data in Hohenpeißenberg sfeld : array-like Pressure data in Seefeld info : str (optional) Information about what is plotted """ # Actual plot, data axis fig, pp = plt.subplots(figsize=(16, 4)) aug.plot(ax=pp, color="green", linewidth=1, ylabel="Pressure (hPa)", label="Augsburg (462m)", title=f"{info} in Dec 2022") hpberg.plot(ax=pp, color="blue", linewidth=1, ylabel="Pressure (hPa)", label="Hohenpeißenberg (977m)") sfeld.plot(ax=pp, color="red", linewidth=1, ylabel="Pressure (hPa)", label="Seefeld (1182m)"); # Gridding pp.grid(which="major") pp.grid(which="minor", axis="both", color="lightgray") pp.legend(loc="lower center", bbox_to_anchor=(0.5, -0.3), ncols=3) return None |
1 | plotpressure500(a["PP_10"], hpb["PP_10"], sf["P"]) |
Looking at the pressure among each stations it is very clear that they don't differ much from each other - they only differ in mean pressure. The mean pressure for each station is:
1 2 3 4 5 6 7 | p = { "Augsburg" : [a['PP_10'].mean(), a['PP_10']], "Hohenpeißenberg" : [hpb['PP_10'].mean(), hpb['PP_10']], "Seefeld" : [sf['P'].mean(), sf['P']] } print(f"Augsburg: {p['Augsburg'][0]:.2f} hPa\nHohenpeißenberg: {p['Hohenpeißenberg'][0]:.2f} hPa\nSeefeld: {p['Seefeld'][0]:.2f} hPa") |
Augsburg: 962.64 hPa Hohenpeißenberg: 904.91 hPa Seefeld: 883.90 hPa
When looking at the whole data range, it is valid to assume hydrostatic balance of the atmosphere with $H \approx 8.65 km$. Based on this assumption, one can calculate atmospheric pressure at each height using
1 2 3 4 5 6 7 8 | height = { "Augsburg" : [.462, 9], "Hohenpeißenberg" : [.977, 8.65], "Seefeld" : [1.182, 8.65] } for z in height: print(f"Expected Pressure in {z}: {1013.25 * np.exp(- height[z][0] / 8.65):.2f} hPa") |
Expected Pressure in Augsburg: 960.55 hPa Expected Pressure in Hohenpeißenberg: 905.03 hPa Expected Pressure in Seefeld: 883.84 hPa
However, for Augsburg $H \approx 9 km$ is a bit more fitting. When reducing the pressure measured on every station to its sea-level niveau using $H = 8.65km$ and $H_{Augsburg} = 9km$ and making the plot again, they conclude to the same pressure niveau.
1 2 | z = "Augsburg" p[z][0] * np.exp(height[z][0] / height[z][1]) |
1013.3497568495574
1 2 3 4 5 6 7 | # Reduce pressure to sea-level niveau pressure = pd.DataFrame() for z in ("Augsburg", "Hohenpeißenberg", "Seefeld"): pressure[z] = p[z][1] * np.exp(height[z][0] / height[z][1]) # Plotting plotpressure500(pressure["Augsburg"], pressure["Hohenpeißenberg"], pressure["Seefeld"], info="Sea-level-pressure") |
From the graph above it is very clear that there is little difference between the pressure among all three stations. However, a more detailed regression analysis will highlight local differences stronger.
1 | pd.plotting.scatter_matrix(pressure, figsize=(7, 7), color="orange", alpha=.2, s=2); |
Now, the largest discrepancies are between Ausgsburg and Seefeld while the differences to Hohenpeißenerg are far smaller ($\pm 5 hPa$). This emphasises the local "uniformity" of pressure - the "nearby stations" measure around the same pressure while the difference gets larger for the greater distance (Augsburg-Seefeld) ($\pm 10 hPa$).
05-05: Precipitation and wind¶
1 2 3 4 | # Precipitation sum over 72 hours a_rr = a["RWS_10"].resample("3d").sum() hpb_rr = hpb["RWS_10"].resample("3d").sum() sf_rr = sf["RR"].resample("3d").sum() |
1 2 3 4 5 6 7 8 9 10 11 12 13 | from matplotlib.dates import DateFormatter, DayLocator # Precipitation sum plot ax = hpb_rr.plot.bar(figsize=(16, 4), color="blue", alpha=.6, ylabel="Precipitation sum (mm)", label="Hohenpeißenberg", \ title="Preicipitation sum over 72 hours (mm)") a_rr.plot.bar(ax=ax, color="Green", alpha=.6, label="Augsburg") sf_rr.plot.bar(ax=ax, color="purple", alpha=.6, label="Seefeld") ax.grid() ax.legend() # Precipitation locator ax.xaxis.set_major_locator(DayLocator(interval=18)) |
For Precipitation there is no longer a clear connectivity - outside the general weather situation determining the time and quantity of precipitation. However, due to the nearby alps the precipitation sum for Seefeld and Hohenpeißenberg are relatively high compared to Augsburg. The sites sums in 2022 were:
1 | print(f'Augsburg: {a["RWS_10"].loc["2022"].sum()}mm\nHohenpeißenberg: {hpb["RWS_10"].loc["2022"].sum()}mm\nSeefeld: {sf["RR"].loc["2022"].sum():.2f}mm') |
Augsburg: 572.07mm Hohenpeißenberg: 1099.02mm Seefeld: 1067.60mm
Wind
Beware, that the DWD wind direction data does only contain numbers between (0,360] while the ZAMG data ranges from [0,360]
1 2 | # Reformat direction into (0, 360]: sf["DD"] = np.where(sf["DD"]==0, 360, sf["DD"]) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | # from A08: def sd_to_uv(speed, direction): def sd_to_uv(speed, direction): """Converts wind speed and direction to (u, v) vector components. Parameters ---------- speed : ndarray-like wind speed in m/s direction : ndarray-like wind direction in degrees, meteorological convention (0° = North) Returns ------- (u, v) : wind u and v vector components (unit: m/s) """ math_dir = np.deg2rad(270 - np.fmod(np.asarray(direction) + 360, 360)) u = speed * np.cos(math_dir) v = speed * np.sin(math_dir) return u, v def uv_to_sd(u, v): """Converts (u, v) wind vector components to wind speed and direction. Parameters ---------- u : ndarray-like u-component of the wind speed in m/s v : ndarray-like v-component of the wind speed in m/s Returns ------- (speed, direction) : wind speed (unit: m/s) and direction (° in the meterological convention) """ u = np.asarray(u) v = np.asarray(v) speed = np.sqrt(u**2 + v**2) direction = 270 - np.rad2deg(np.arctan2(v, u)) return speed, np.fmod(direction + 360, 360) |
1 2 3 4 5 6 7 8 9 | # Calculate u_wind & v_wind a["u_wind"], a["v_wind"] = sd_to_uv(a["FF_10"], a["DD_10"]) hpb["u_wind"], hpb["v_wind"] = sd_to_uv(hpb["FF_10"], hpb["DD_10"]) sf["u_wind"], sf["v_wind"] = sd_to_uv(sf["FF"], sf["DD"]) # Print the average wind direction print(f'In Augsburg, the wind came from {uv_to_sd(a["u_wind"].loc["2022"].mean(), a["v_wind"].loc["2022"].mean())[1]:.0f}°\n' f'In Hohenpeißenberg, the wind came from {uv_to_sd(hpb["u_wind"].loc["2022"].mean(), hpb["v_wind"].loc["2022"].mean())[1]:.0f}°\n' f'In Seefeld, the wind came from {uv_to_sd(sf["u_wind"].loc["2022"].mean(), sf["v_wind"].loc["2022"].mean())[1]:.0f}°') |
In Augsburg, the wind came from 246° In Hohenpeißenberg, the wind came from 243° In Seefeld, the wind came from 89°
The below plotted windroses show the wind characteristics in 2022. The mountainous terrain in Seefeld forces the wind to flow in SE-NW direction at the station level. In Hohenpeißenberg this effect is less pronounced due to the measurement site being on a flatter hillside. However, in Augsburg, this effect is not visible due to the almost flat terrain at the location.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | from windrose import WindroseAxes wind = { "Augsburg" : [a["DD_10"].loc["2022"], a["FF_10"].loc["2022"]], "Hohenpeißenberg" : [hpb["DD_10"].loc["2022"], hpb["FF_10"].loc["2022"]], "Seefeld" : [sf["DD"].loc["2022"], sf["FF"].loc["2022"]] } for s in wind: ax = WindroseAxes.from_ax() ax.contourf(wind[s][0], wind[s][1], bins=np.arange(0, 8, 1), normed=True) ax.set_title(f"Windrose for {s} in 2022") ax.legend(title="Wind Speed (m/s)", loc="upper left") |