1 2 3 4 5 6 7 | import pandas as pd import matplotlib.pyplot as plt from matplotlib.ticker import FuncFormatter import numpy as np import matplotlib as plt import plotly import bokeh |
1 | import pandas as pd |
🌸Cherry Blossom in Kyoto¶
Jupyter notebook(s) that tell stories about climate-related datasets:¶
The first dataset that we took is a work collected by Yasuyuki Aono -Associate Professor- on Cherry blossom phenology and temperature reconstructions at Kyoto as a part of the Ecological Meteorology Research Group at Osaka Prefecture University The second dataset is from the Japan Meteorological Agency. It is taken from the Phenological observation information under Global environment/climate for the sakura flowering dates (2011-2020)
Where are the numbers coming from? Who produced them? When?¶
Our dataset is made of data from 2 separate sources. The first spans from 812-2015 and is from the Ecological Meteorology Research Group at Osaka University. The second set of data is from the Japan Meteorological Society and spans from 2015-2024.
Which information do these datasets contain? In which format?¶
In the Ecological Meterology Research Group (EMRG) dataset, In the Japan Meteorological Society (JMS) dataset, the data is displayed on the website, requiring an HTML reader.
Is there any appropriate license?¶
Have to check
What are the interesting rows and columns within datasets? Why?¶
There are three interesting columns: one is the day of the year when they blossomed, and the other is the date they all blossom in April, so the month is April for the whole year with different dates. We can see which week it gets more bloomed. The last column tells us about the twenty-year average day of the year with peak cherry blossom.
Are there any missing data points? If yes, how are they coded?¶
The missing data points are mainly for the dates of the bloomed cherry blossoms. We still have to discuss
What is the appropriate numerical transformation or normalization?¶
It can be done by composing the series into trend, seasonal, and residual components. To compare the flowering dates or use these in predictive models with other numeric variables, normalization (such as Min-Max Scaling or Z-score normalization)
What is the best way to visualize these data? Why?¶
Time series plots can be interesting, with specific dates around the year having many occurrences. Heat map Histogram or density plot- to understand the distribution of the dates
What do these data tell us about climate change?¶
Earlier Flowering Dates: If the data shows a trend of increasingly earlier flowering dates, it could indicate warming temperatures. Year-to-Year Variability: Increased variability in flowering dates might suggest more erratic weather patterns, potentially linked to climate change. Long-Term Trends: Comparing century-long periods can provide insights into how climate patterns have shifted over longer timescales.
1 |
1 | d1= pd.read_excel('/content/KyotoFullFlower7.xls') |
1 | d2= pd.read_csv('/content/date-of-the-peak-cherry-tree-blossom-in-kyoto.csv') |
1 | d1.head(25) |
| Full-flowering dates of Japanese cherry (Prunus jamasakura) at Kyoto, Japan. (Latest version, Jun. 12, 2012) | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | |
|---|---|---|---|---|---|---|
| 0 | This phenological data was acquired by followi... | NaN | NaN | NaN | NaN | NaN |
| 1 | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | Column A; A.D. | NaN | NaN | NaN | NaN | NaN |
| 3 | Column B; Full-flowering date (day of year). | NaN | NaN | NaN | NaN | NaN |
| 4 | Column C; Full-flowering date (calender date, ... | NaN | NaN | NaN | NaN | NaN |
| 5 | Column D; Source code | NaN | NaN | NaN | NaN | NaN |
| 6 | 1; Reported by Taguchi (1939), J. Marine Mete... | NaN | NaN | NaN | NaN | NaN |
| 7 | 2; Added by Sekiguchi (1969), Tokyo Geography... | NaN | NaN | NaN | NaN | NaN |
| 8 | 3; Added by Aono and Omoto (1994), J. Agric. ... | NaN | NaN | NaN | NaN | NaN |
| 9 | 4; Added by Aono and Kazui (2008), Int. J. Cl... | NaN | NaN | NaN | NaN | NaN |
| 10 | 5: Cherry phenological data, Added by Aono an... | NaN | NaN | NaN | NaN | NaN |
| 11 | 6: Added by Aono (2011), Time Studies, 4, 17-... | NaN | NaN | NaN | NaN | NaN |
| 12 | 7: Added by Aono (2012), Chikyu Kankyo, 17, 2... | NaN | NaN | NaN | NaN | NaN |
| 13 | 8: Found after the last publication of articles. | NaN | NaN | NaN | NaN | NaN |
| 14 | Column E; Data type code | NaN | NaN | NaN | NaN | NaN |
| 15 | 0 : data from modern times (full-bloom date s... | NaN | NaN | NaN | NaN | NaN |
| 16 | 1 : from diary description about full-bloom | NaN | NaN | NaN | NaN | NaN |
| 17 | 2 : from diary description about cherry bloss... | NaN | NaN | NaN | NaN | NaN |
| 18 | 3 : from diary description about presents of ... | NaN | NaN | NaN | NaN | NaN |
| 19 | 4 : title in Japanese poety | NaN | NaN | NaN | NaN | NaN |
| 20 | 8 : Deduced from wisteria phenology, using th... | NaN | NaN | NaN | NaN | NaN |
| 21 | 9 : Deduced from Japanese kerria phenology, u... | NaN | NaN | NaN | NaN | NaN |
| 22 | Column F; Names of old documents | NaN | NaN | NaN | NaN | NaN |
| 23 | NaN | NaN | NaN | NaN | NaN | NaN |
| 24 | AD | Full-flowering date (DOY) | Full-flowering date | Source code | Data type code | Reference Name |
1 2 | d1 = d1.iloc[25:] d1.head(3) |
| Full-flowering dates of Japanese cherry (Prunus jamasakura) at Kyoto, Japan. (Latest version, Jun. 12, 2012) | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | |
|---|---|---|---|---|---|---|
| 25 | 801 | NaN | NaN | NaN | NaN | - |
| 26 | 802 | NaN | NaN | NaN | NaN | - |
| 27 | 803 | NaN | NaN | NaN | NaN | - |
1 | d1 = d1.rename(columns= {'Full-flowering dates of Japanese cherry (Prunus jamasakura) at Kyoto, Japan. (Latest version, Jun. 12, 2012)': 'Year', 'Unnamed: 1': 'Full-flowering date (DOY)', 'Unnamed: 2': 'Full-flowering date', 'Unnamed: 3': 'Source Code', 'Unnamed: 4': 'Data type code', 'Unnamed: 5': 'Reference Name'}) |
1 | d1.columns |
Index(['Year', 'Full-flowering date (DOY)', 'Full-flowering date',
'Source Code', 'Data type code', 'Reference Name'],
dtype='object')
1 2 3 4 5 6 7 8 9 10 | d1 = d1[['Year', 'Full-flowering date (DOY)', 'Full-flowering date']] d2 = d2[['Year', 'Twenty-year average day of the year with peak cherry blossom']] merged_df = pd.merge(d1, d2, on='Year', how='inner') merged_df['Name of country'] = 'Japan' final_df = merged_df[['Name of country', 'Year', 'Full-flowering date (DOY)', 'Full-flowering date', 'Twenty-year average day of the year with peak cherry blossom']] final_df.head() |
| Name of country | Year | Full-flowering date (DOY) | Full-flowering date | Twenty-year average day of the year with peak cherry blossom | |
|---|---|---|---|---|---|
| 0 | Japan | 812 | 92 | 401 | NaN |
| 1 | Japan | 815 | 105 | 415 | NaN |
| 2 | Japan | 831 | 96 | 406 | NaN |
| 3 | Japan | 851 | 108 | 418 | NaN |
| 4 | Japan | 853 | 104 | 414 | NaN |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | import matplotlib.pyplot as plt from matplotlib.ticker import FuncFormatter year_column = 'Year' flowering_doy_column = 'Full-flowering date (DOY)' def format_year(x, _): return f'{int(x)}' plt.style.use('dark_background') plt.figure(figsize=(12, 8)) # Plotting 'Full-flowering date (DOY)' over 'Year' plt.plot(final_df[year_column], final_df[flowering_doy_column], linestyle='-', markersize=3, color='red') plt.xlabel('Year') plt.ylabel('Full-flowering date (Day of Year)') plt.suptitle('Cherry Blossom Full-Flowering Dates in Kyoto') plt.title('Historical Full-Flowering Dates over the Years') plt.gca().xaxis.set_major_formatter(FuncFormatter(format_year)) plt.grid(True, axis='y', linestyle='--', alpha=0.7) plt.show() |
1 | final_df.head(3) |
| Name of country | Year | Full-flowering date (DOY) | Full-flowering date | Twenty-year average day of the year with peak cherry blossom | |
|---|---|---|---|---|---|
| 0 | Japan | 812 | 92 | 401 | NaN |
| 1 | Japan | 815 | 105 | 415 | NaN |
| 2 | Japan | 831 | 96 | 406 | NaN |
Full-flowering date (DOY): day of year:
Full-flowering date (calender date, e.g. 402 --> April 2)
1 |
1 2 | import pandas as pd !pip install lxml html5lib beautifulsoup4 |
Requirement already satisfied: lxml in /usr/local/lib/python3.10/dist-packages (4.9.4) Requirement already satisfied: html5lib in /usr/local/lib/python3.10/dist-packages (1.1) Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (4.12.3) Requirement already satisfied: six>=1.9 in /usr/local/lib/python3.10/dist-packages (from html5lib) (1.16.0) Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from html5lib) (0.5.1) Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4) (2.5)
1 2 | url = 'https://www.data.jma.go.jp/sakura/data/sakura003_06.html' all_tables = pd.read_html(url) |
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-16-81e4b315e617> in <cell line: 2>() 1 url = 'https://www.data.jma.go.jp/sakura/data/sakura003_06.html' ----> 2 all_tables = pd.read_html(url) /usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs) 329 stacklevel=find_stack_level(), 330 ) --> 331 return func(*args, **kwargs) 332 333 # error: "Callable[[VarArg(Any), KwArg(Any)], Any]" has no /usr/local/lib/python3.10/dist-packages/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, thousands, encoding, decimal, converters, na_values, keep_default_na, displayed_only, extract_links) 1203 io = stringify_path(io) 1204 -> 1205 return _parse( 1206 flavor=flavor, 1207 io=io, /usr/local/lib/python3.10/dist-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, extract_links, **kwargs) 1004 else: 1005 assert retained is not None # for mypy -> 1006 raise retained 1007 1008 ret = [] /usr/local/lib/python3.10/dist-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, extract_links, **kwargs) 984 985 try: --> 986 tables = p.parse_tables() 987 except ValueError as caught: 988 # if `io` is an io-like object, check if it's seekable /usr/local/lib/python3.10/dist-packages/pandas/io/html.py in parse_tables(self) 260 list of parsed (header, body, footer) tuples from tables. 261 """ --> 262 tables = self._parse_tables(self._build_doc(), self.match, self.attrs) 263 return (self._parse_thead_tbody_tfoot(table) for table in tables) 264 /usr/local/lib/python3.10/dist-packages/pandas/io/html.py in _parse_tables(self, doc, match, attrs) 616 617 if not tables: --> 618 raise ValueError("No tables found") 619 620 result = [] ValueError: No tables found
🍇Grape Harvest in France¶
Where are the numbers coming from? Who produced them? When?¶
The dataset is from NOAA, produced by the World Data Center for Paleoclimatology, Boulder and the NOAA Paleoclimatology Program. The dataset spans from 1354-2007. It was produced in 2012. Currently, we are still looking for a reliable dataset that spans 2007-present day. We have requested access to euroclimhist, which contains many different datasets including wine harvest dates. Once access is granted, this section will be updated with more information.
Which information do these datasets contain? In which format?¶
The NOAA dataset contains information about grape harvest from 1354-2007 in multiple regions of France (for example, Alsace, Auvergne, Auxerre-Avalon, Beaujolais and Maconn, Bordeaux, Burgundy, Champagne 1, Champagne 2, Gaillac- South-West), Germany and Swizerland. The format is a webpage
Is there any appropriate license?¶
No licenses found
What are the interesting rows and columns within datasets? Why?¶
The columns include many areas of wine harvest in France, which will be our focus for the analysis. The rows are the years of harvest
Are there any missing data points? If yes, how are they coded?¶
There are many missing data points, they are represented with blank spaces
What is the appropriate numerical transformation or normalization?¶
The dates are presented as number of days after August 31st, so they will have to be transformed into normal calendar dates.
What is the best way to visualize these data? Why?¶
The data will be best visualized with a time series plot showing grape harvest dates over the years for specific regions. This would allow for the display of patterns or anomalies related to climate conditions.
What these data tells us about climate change?¶
TBD
1 |
1 |
1 |