1
2
3
4
5
6
7
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import numpy as np
import matplotlib as plt
import plotly
import bokeh
1
import pandas as pd

🌸Cherry Blossom in Kyoto¶

Jupyter notebook(s) that tell stories about climate-related datasets:¶

The first dataset that we took is a work collected by Yasuyuki Aono -Associate Professor- on Cherry blossom phenology and temperature reconstructions at Kyoto as a part of the Ecological Meteorology Research Group at Osaka Prefecture University The second dataset is from the Japan Meteorological Agency. It is taken from the Phenological observation information under Global environment/climate for the sakura flowering dates (2011-2020)

Where are the numbers coming from? Who produced them? When?¶

Our dataset is made of data from 2 separate sources. The first spans from 812-2015 and is from the Ecological Meteorology Research Group at Osaka University. The second set of data is from the Japan Meteorological Society and spans from 2015-2024.

Which information do these datasets contain? In which format?¶

In the Ecological Meterology Research Group (EMRG) dataset, In the Japan Meteorological Society (JMS) dataset, the data is displayed on the website, requiring an HTML reader.

Is there any appropriate license?¶

Have to check

What are the interesting rows and columns within datasets? Why?¶

There are three interesting columns: one is the day of the year when they blossomed, and the other is the date they all blossom in April, so the month is April for the whole year with different dates. We can see which week it gets more bloomed. The last column tells us about the twenty-year average day of the year with peak cherry blossom.

Are there any missing data points? If yes, how are they coded?¶

The missing data points are mainly for the dates of the bloomed cherry blossoms. We still have to discuss

What is the appropriate numerical transformation or normalization?¶

It can be done by composing the series into trend, seasonal, and residual components. To compare the flowering dates or use these in predictive models with other numeric variables, normalization (such as Min-Max Scaling or Z-score normalization)

What is the best way to visualize these data? Why?¶

Time series plots can be interesting, with specific dates around the year having many occurrences. Heat map Histogram or density plot- to understand the distribution of the dates

What do these data tell us about climate change?¶

Earlier Flowering Dates: If the data shows a trend of increasingly earlier flowering dates, it could indicate warming temperatures. Year-to-Year Variability: Increased variability in flowering dates might suggest more erratic weather patterns, potentially linked to climate change. Long-Term Trends: Comparing century-long periods can provide insights into how climate patterns have shifted over longer timescales.

1
 
1
d1= pd.read_excel('/content/KyotoFullFlower7.xls')
1
d2= pd.read_csv('/content/date-of-the-peak-cherry-tree-blossom-in-kyoto.csv')
1
d1.head(25)
Full-flowering dates of Japanese cherry (Prunus jamasakura) at Kyoto, Japan. (Latest version, Jun. 12, 2012) Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5
0 This phenological data was acquired by followi... NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 Column A; A.D. NaN NaN NaN NaN NaN
3 Column B; Full-flowering date (day of year). NaN NaN NaN NaN NaN
4 Column C; Full-flowering date (calender date, ... NaN NaN NaN NaN NaN
5 Column D; Source code NaN NaN NaN NaN NaN
6 1; Reported by Taguchi (1939), J. Marine Mete... NaN NaN NaN NaN NaN
7 2; Added by Sekiguchi (1969), Tokyo Geography... NaN NaN NaN NaN NaN
8 3; Added by Aono and Omoto (1994), J. Agric. ... NaN NaN NaN NaN NaN
9 4; Added by Aono and Kazui (2008), Int. J. Cl... NaN NaN NaN NaN NaN
10 5: Cherry phenological data, Added by Aono an... NaN NaN NaN NaN NaN
11 6: Added by Aono (2011), Time Studies, 4, 17-... NaN NaN NaN NaN NaN
12 7: Added by Aono (2012), Chikyu Kankyo, 17, 2... NaN NaN NaN NaN NaN
13 8: Found after the last publication of articles. NaN NaN NaN NaN NaN
14 Column E; Data type code NaN NaN NaN NaN NaN
15 0 : data from modern times (full-bloom date s... NaN NaN NaN NaN NaN
16 1 : from diary description about full-bloom NaN NaN NaN NaN NaN
17 2 : from diary description about cherry bloss... NaN NaN NaN NaN NaN
18 3 : from diary description about presents of ... NaN NaN NaN NaN NaN
19 4 : title in Japanese poety NaN NaN NaN NaN NaN
20 8 : Deduced from wisteria phenology, using th... NaN NaN NaN NaN NaN
21 9 : Deduced from Japanese kerria phenology, u... NaN NaN NaN NaN NaN
22 Column F; Names of old documents NaN NaN NaN NaN NaN
23 NaN NaN NaN NaN NaN NaN
24 AD Full-flowering date (DOY) Full-flowering date Source code Data type code Reference Name
1
2
d1 = d1.iloc[25:]
d1.head(3)
Full-flowering dates of Japanese cherry (Prunus jamasakura) at Kyoto, Japan. (Latest version, Jun. 12, 2012) Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5
25 801 NaN NaN NaN NaN -
26 802 NaN NaN NaN NaN -
27 803 NaN NaN NaN NaN -
1
d1 = d1.rename(columns= {'Full-flowering dates of Japanese cherry (Prunus jamasakura) at Kyoto, Japan. (Latest version, Jun. 12, 2012)': 'Year', 'Unnamed: 1': 'Full-flowering date (DOY)', 'Unnamed: 2': 'Full-flowering date', 'Unnamed: 3': 'Source Code', 'Unnamed: 4': 'Data type code', 'Unnamed: 5': 'Reference Name'})
1
d1.columns
Index(['Year', 'Full-flowering date (DOY)', 'Full-flowering date',
       'Source Code', 'Data type code', 'Reference Name'],
      dtype='object')
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
d1 = d1[['Year', 'Full-flowering date (DOY)', 'Full-flowering date']]
d2 = d2[['Year', 'Twenty-year average day of the year with peak cherry blossom']]

merged_df = pd.merge(d1, d2, on='Year', how='inner')

merged_df['Name of country'] = 'Japan'

final_df = merged_df[['Name of country', 'Year', 'Full-flowering date (DOY)', 'Full-flowering date', 'Twenty-year average day of the year with peak cherry blossom']]

final_df.head()
Name of country Year Full-flowering date (DOY) Full-flowering date Twenty-year average day of the year with peak cherry blossom
0 Japan 812 92 401 NaN
1 Japan 815 105 415 NaN
2 Japan 831 96 406 NaN
3 Japan 851 108 418 NaN
4 Japan 853 104 414 NaN
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

year_column = 'Year'
flowering_doy_column = 'Full-flowering date (DOY)'

def format_year(x, _):
    return f'{int(x)}'

plt.style.use('dark_background')
plt.figure(figsize=(12, 8))

# Plotting 'Full-flowering date (DOY)' over 'Year'
plt.plot(final_df[year_column], final_df[flowering_doy_column], linestyle='-', markersize=3, color='red')

plt.xlabel('Year')
plt.ylabel('Full-flowering date (Day of Year)')
plt.suptitle('Cherry Blossom Full-Flowering Dates in Kyoto')
plt.title('Historical Full-Flowering Dates over the Years')
plt.gca().xaxis.set_major_formatter(FuncFormatter(format_year))
plt.grid(True, axis='y', linestyle='--', alpha=0.7)
plt.show()
No description has been provided for this image
1
final_df.head(3)
Name of country Year Full-flowering date (DOY) Full-flowering date Twenty-year average day of the year with peak cherry blossom
0 Japan 812 92 401 NaN
1 Japan 815 105 415 NaN
2 Japan 831 96 406 NaN

Full-flowering date (DOY): day of year:

Full-flowering date (calender date, e.g. 402 --> April 2)

1
 
1
2
import pandas as pd
!pip install lxml html5lib beautifulsoup4
Requirement already satisfied: lxml in /usr/local/lib/python3.10/dist-packages (4.9.4)
Requirement already satisfied: html5lib in /usr/local/lib/python3.10/dist-packages (1.1)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (4.12.3)
Requirement already satisfied: six>=1.9 in /usr/local/lib/python3.10/dist-packages (from html5lib) (1.16.0)
Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from html5lib) (0.5.1)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4) (2.5)
1
2
url = 'https://www.data.jma.go.jp/sakura/data/sakura003_06.html'
all_tables = pd.read_html(url)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-81e4b315e617> in <cell line: 2>()
      1 url = 'https://www.data.jma.go.jp/sakura/data/sakura003_06.html'
----> 2 all_tables = pd.read_html(url)

/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    329                     stacklevel=find_stack_level(),
    330                 )
--> 331             return func(*args, **kwargs)
    332 
    333         # error: "Callable[[VarArg(Any), KwArg(Any)], Any]" has no

/usr/local/lib/python3.10/dist-packages/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, thousands, encoding, decimal, converters, na_values, keep_default_na, displayed_only, extract_links)
   1203     io = stringify_path(io)
   1204 
-> 1205     return _parse(
   1206         flavor=flavor,
   1207         io=io,

/usr/local/lib/python3.10/dist-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, extract_links, **kwargs)
   1004     else:
   1005         assert retained is not None  # for mypy
-> 1006         raise retained
   1007 
   1008     ret = []

/usr/local/lib/python3.10/dist-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, extract_links, **kwargs)
    984 
    985         try:
--> 986             tables = p.parse_tables()
    987         except ValueError as caught:
    988             # if `io` is an io-like object, check if it's seekable

/usr/local/lib/python3.10/dist-packages/pandas/io/html.py in parse_tables(self)
    260         list of parsed (header, body, footer) tuples from tables.
    261         """
--> 262         tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
    263         return (self._parse_thead_tbody_tfoot(table) for table in tables)
    264 

/usr/local/lib/python3.10/dist-packages/pandas/io/html.py in _parse_tables(self, doc, match, attrs)
    616 
    617         if not tables:
--> 618             raise ValueError("No tables found")
    619 
    620         result = []

ValueError: No tables found

🍇Grape Harvest in France¶

Where are the numbers coming from? Who produced them? When?¶

The dataset is from NOAA, produced by the World Data Center for Paleoclimatology, Boulder and the NOAA Paleoclimatology Program. The dataset spans from 1354-2007. It was produced in 2012. Currently, we are still looking for a reliable dataset that spans 2007-present day. We have requested access to euroclimhist, which contains many different datasets including wine harvest dates. Once access is granted, this section will be updated with more information.

Which information do these datasets contain? In which format?¶

The NOAA dataset contains information about grape harvest from 1354-2007 in multiple regions of France (for example, Alsace, Auvergne, Auxerre-Avalon, Beaujolais and Maconn, Bordeaux, Burgundy, Champagne 1, Champagne 2, Gaillac- South-West), Germany and Swizerland. The format is a webpage

Is there any appropriate license?¶

No licenses found

What are the interesting rows and columns within datasets? Why?¶

The columns include many areas of wine harvest in France, which will be our focus for the analysis. The rows are the years of harvest

Are there any missing data points? If yes, how are they coded?¶

There are many missing data points, they are represented with blank spaces

What is the appropriate numerical transformation or normalization?¶

The dates are presented as number of days after August 31st, so they will have to be transformed into normal calendar dates.

What is the best way to visualize these data? Why?¶

The data will be best visualized with a time series plot showing grape harvest dates over the years for specific regions. This would allow for the display of patterns or anomalies related to climate conditions.

What these data tells us about climate change?¶

TBD

1
 
1
 
1