import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import numpy as np
import matplotlib as plt
import plotly
import bokeh

import pandas as pd

🌸Cherry Blossom in Kyoto¶

The first dataset that we took is a work collected by Yasuyuki Aono -Associate Professor- on Cherry blossom phenology and temperature reconstructions at Kyoto as a part of the Ecological Meteorology Research Group at Osaka Prefecture University The second dataset is from the Japan Meteorological Agency. It is taken from the Phenological observation information under Global environment/climate for the sakura flowering dates (2011-2020)

Where are the numbers coming from? Who produced them? When?¶

Our dataset is made of data from 2 separate sources. The first spans from 812-2015 and is from the Ecological Meteorology Research Group at Osaka University. The second set of data is from the Japan Meteorological Society and spans from 2015-2024.

Which information do these datasets contain? In which format?¶

In the Ecological Meterology Research Group (EMRG) dataset, In the Japan Meteorological Society (JMS) dataset, the data is displayed on the website, requiring an HTML reader.

Is there any appropriate license?¶

Have to check

What are the interesting rows and columns within datasets? Why?¶

There are three interesting columns: one is the day of the year when they blossomed, and the other is the date they all blossom in April, so the month is April for the whole year with different dates. We can see which week it gets more bloomed. The last column tells us about the twenty-year average day of the year with peak cherry blossom.

Are there any missing data points? If yes, how are they coded?¶

The missing data points are mainly for the dates of the bloomed cherry blossoms. We still have to discuss

What is the appropriate numerical transformation or normalization?¶

It can be done by composing the series into trend, seasonal, and residual components. To compare the flowering dates or use these in predictive models with other numeric variables, normalization (such as Min-Max Scaling or Z-score normalization)

What is the best way to visualize these data? Why?¶

Time series plots can be interesting, with specific dates around the year having many occurrences. Heat map Histogram or density plot- to understand the distribution of the dates

What do these data tell us about climate change?¶

Earlier Flowering Dates: If the data shows a trend of increasingly earlier flowering dates, it could indicate warming temperatures. Year-to-Year Variability: Increased variability in flowering dates might suggest more erratic weather patterns, potentially linked to climate change. Long-Term Trends: Comparing century-long periods can provide insights into how climate patterns have shifted over longer timescales.

d1= pd.read_excel('/content/KyotoFullFlower7.xls')

d2= pd.read_csv('/content/date-of-the-peak-cherry-tree-blossom-in-kyoto.csv')

d1.head(25)

	Full-flowering dates of Japanese cherry (Prunus jamasakura) at Kyoto, Japan. (Latest version, Jun. 12, 2012)	Unnamed: 1	Unnamed: 2	Unnamed: 3	Unnamed: 4	Unnamed: 5
0	This phenological data was acquired by followi...	NaN	NaN	NaN	NaN	NaN
1	NaN	NaN	NaN	NaN	NaN	NaN
2	Column A; A.D.	NaN	NaN	NaN	NaN	NaN
3	Column B; Full-flowering date (day of year).	NaN	NaN	NaN	NaN	NaN
4	Column C; Full-flowering date (calender date, ...	NaN	NaN	NaN	NaN	NaN
5	Column D; Source code	NaN	NaN	NaN	NaN	NaN
6	1; Reported by Taguchi (1939), J. Marine Mete...	NaN	NaN	NaN	NaN	NaN
7	2; Added by Sekiguchi (1969), Tokyo Geography...	NaN	NaN	NaN	NaN	NaN
8	3; Added by Aono and Omoto (1994), J. Agric. ...	NaN	NaN	NaN	NaN	NaN
9	4; Added by Aono and Kazui (2008), Int. J. Cl...	NaN	NaN	NaN	NaN	NaN
10	5: Cherry phenological data, Added by Aono an...	NaN	NaN	NaN	NaN	NaN
11	6: Added by Aono (2011), Time Studies, 4, 17-...	NaN	NaN	NaN	NaN	NaN
12	7: Added by Aono (2012), Chikyu Kankyo, 17, 2...	NaN	NaN	NaN	NaN	NaN
13	8: Found after the last publication of articles.	NaN	NaN	NaN	NaN	NaN
14	Column E; Data type code	NaN	NaN	NaN	NaN	NaN
15	0 : data from modern times (full-bloom date s...	NaN	NaN	NaN	NaN	NaN
16	1 : from diary description about full-bloom	NaN	NaN	NaN	NaN	NaN
17	2 : from diary description about cherry bloss...	NaN	NaN	NaN	NaN	NaN
18	3 : from diary description about presents of ...	NaN	NaN	NaN	NaN	NaN
19	4 : title in Japanese poety	NaN	NaN	NaN	NaN	NaN
20	8 : Deduced from wisteria phenology, using th...	NaN	NaN	NaN	NaN	NaN
21	9 : Deduced from Japanese kerria phenology, u...	NaN	NaN	NaN	NaN	NaN
22	Column F; Names of old documents	NaN	NaN	NaN	NaN	NaN
23	NaN	NaN	NaN	NaN	NaN	NaN
24	AD	Full-flowering date (DOY)	Full-flowering date	Source code	Data type code	Reference Name

d1 = d1.iloc[25:]
d1.head(3)

	Full-flowering dates of Japanese cherry (Prunus jamasakura) at Kyoto, Japan. (Latest version, Jun. 12, 2012)	Unnamed: 1	Unnamed: 2	Unnamed: 3	Unnamed: 4	Unnamed: 5
25	801	NaN	NaN	NaN	NaN	-
26	802	NaN	NaN	NaN	NaN	-
27	803	NaN	NaN	NaN	NaN	-

d1 = d1.rename(columns= {'Full-flowering dates of Japanese cherry (Prunus jamasakura) at Kyoto, Japan. (Latest version, Jun. 12, 2012)': 'Year', 'Unnamed: 1': 'Full-flowering date (DOY)', 'Unnamed: 2': 'Full-flowering date', 'Unnamed: 3': 'Source Code', 'Unnamed: 4': 'Data type code', 'Unnamed: 5': 'Reference Name'})

d1.columns

Index(['Year', 'Full-flowering date (DOY)', 'Full-flowering date',
       'Source Code', 'Data type code', 'Reference Name'],
      dtype='object')

d1 = d1[['Year', 'Full-flowering date (DOY)', 'Full-flowering date']]
d2 = d2[['Year', 'Twenty-year average day of the year with peak cherry blossom']]

merged_df = pd.merge(d1, d2, on='Year', how='inner')

merged_df['Name of country'] = 'Japan'

final_df = merged_df[['Name of country', 'Year', 'Full-flowering date (DOY)', 'Full-flowering date', 'Twenty-year average day of the year with peak cherry blossom']]

final_df.head()

	Name of country	Year	Full-flowering date (DOY)	Full-flowering date	Twenty-year average day of the year with peak cherry blossom
0	Japan	812	92	401	NaN
1	Japan	815	105	415	NaN
2	Japan	831	96	406	NaN
3	Japan	851	108	418	NaN
4	Japan	853	104	414	NaN

import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

year_column = 'Year'
flowering_doy_column = 'Full-flowering date (DOY)'

def format_year(x, _):
    return f'{int(x)}'

plt.style.use('dark_background')
plt.figure(figsize=(12, 8))

# Plotting 'Full-flowering date (DOY)' over 'Year'
plt.plot(final_df[year_column], final_df[flowering_doy_column], linestyle='-', markersize=3, color='red')

plt.xlabel('Year')
plt.ylabel('Full-flowering date (Day of Year)')
plt.suptitle('Cherry Blossom Full-Flowering Dates in Kyoto')
plt.title('Historical Full-Flowering Dates over the Years')
plt.gca().xaxis.set_major_formatter(FuncFormatter(format_year))
plt.grid(True, axis='y', linestyle='--', alpha=0.7)
plt.show()

No description has been provided for this image

final_df.head(3)

	Name of country	Year	Full-flowering date (DOY)	Full-flowering date	Twenty-year average day of the year with peak cherry blossom
0	Japan	812	92	401	NaN
1	Japan	815	105	415	NaN
2	Japan	831	96	406	NaN

Full-flowering date (DOY): day of year:

Full-flowering date (calender date, e.g. 402 --> April 2)

import pandas as pd
!pip install lxml html5lib beautifulsoup4

Requirement already satisfied: lxml in /usr/local/lib/python3.10/dist-packages (4.9.4)
Requirement already satisfied: html5lib in /usr/local/lib/python3.10/dist-packages (1.1)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (4.12.3)
Requirement already satisfied: six>=1.9 in /usr/local/lib/python3.10/dist-packages (from html5lib) (1.16.0)
Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from html5lib) (0.5.1)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4) (2.5)

url = 'https://www.data.jma.go.jp/sakura/data/sakura003_06.html'
all_tables = pd.read_html(url)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-81e4b315e617> in <cell line: 2>()
      1 url = 'https://www.data.jma.go.jp/sakura/data/sakura003_06.html'
----> 2 all_tables = pd.read_html(url)

/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    329                     stacklevel=find_stack_level(),
    330                 )
--> 331             return func(*args, **kwargs)
    332 
    333         # error: "Callable[[VarArg(Any), KwArg(Any)], Any]" has no

/usr/local/lib/python3.10/dist-packages/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, thousands, encoding, decimal, converters, na_values, keep_default_na, displayed_only, extract_links)
   1203     io = stringify_path(io)
   1204 
-> 1205     return _parse(
   1206         flavor=flavor,
   1207         io=io,

/usr/local/lib/python3.10/dist-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, extract_links, **kwargs)
   1004     else:
   1005         assert retained is not None  # for mypy
-> 1006         raise retained
   1007 
   1008     ret = []

/usr/local/lib/python3.10/dist-packages/pandas/io/html.py in _parse(flavor, io, match, attrs, encoding, displayed_only, extract_links, **kwargs)
    984 
    985         try:
--> 986             tables = p.parse_tables()
    987         except ValueError as caught:
    988             # if `io` is an io-like object, check if it's seekable

/usr/local/lib/python3.10/dist-packages/pandas/io/html.py in parse_tables(self)
    260         list of parsed (header, body, footer) tuples from tables.
    261         """
--> 262         tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
    263         return (self._parse_thead_tbody_tfoot(table) for table in tables)
    264 

/usr/local/lib/python3.10/dist-packages/pandas/io/html.py in _parse_tables(self, doc, match, attrs)
    616 
    617         if not tables:
--> 618             raise ValueError("No tables found")
    619 
    620         result = []

ValueError: No tables found

🍇Grape Harvest in France¶

Where are the numbers coming from? Who produced them? When?¶

The dataset is from NOAA, produced by the World Data Center for Paleoclimatology, Boulder and the NOAA Paleoclimatology Program. The dataset spans from 1354-2007. It was produced in 2012. Currently, we are still looking for a reliable dataset that spans 2007-present day. We have requested access to euroclimhist, which contains many different datasets including wine harvest dates. Once access is granted, this section will be updated with more information.

Which information do these datasets contain? In which format?¶

The NOAA dataset contains information about grape harvest from 1354-2007 in multiple regions of France (for example, Alsace, Auvergne, Auxerre-Avalon, Beaujolais and Maconn, Bordeaux, Burgundy, Champagne 1, Champagne 2, Gaillac- South-West), Germany and Swizerland. The format is a webpage

Is there any appropriate license?¶

No licenses found

What are the interesting rows and columns within datasets? Why?¶

The columns include many areas of wine harvest in France, which will be our focus for the analysis. The rows are the years of harvest

Are there any missing data points? If yes, how are they coded?¶

There are many missing data points, they are represented with blank spaces

What is the appropriate numerical transformation or normalization?¶

The dates are presented as number of days after August 31st, so they will have to be transformed into normal calendar dates.

What is the best way to visualize these data? Why?¶

The data will be best visualized with a time series plot showing grape harvest dates over the years for specific regions. This would allow for the display of patterns or anomalies related to climate conditions.

What these data tells us about climate change?¶

TBD

🌸Cherry Blossom in Kyoto¶

Jupyter notebook(s) that tell stories about climate-related datasets:¶

Where are the numbers coming from? Who produced them? When?¶

Which information do these datasets contain? In which format?¶

Is there any appropriate license?¶

What are the interesting rows and columns within datasets? Why?¶

Are there any missing data points? If yes, how are they coded?¶

What is the appropriate numerical transformation or normalization?¶

What is the best way to visualize these data? Why?¶

What do these data tell us about climate change?¶

🍇Grape Harvest in France¶

Where are the numbers coming from? Who produced them? When?¶

Which information do these datasets contain? In which format?¶

Is there any appropriate license?¶

What are the interesting rows and columns within datasets? Why?¶

Are there any missing data points? If yes, how are they coded?¶

What is the appropriate numerical transformation or normalization?¶

What is the best way to visualize these data? Why?¶

What these data tells us about climate change?¶