Ocean and Land Temperature Anomalies¶
The Boyz are back¶
Team Members¶
- Saffian Asghar
- Alexis Culpin
- Romaric Sallustre
- Emilio Espinosa S.
Datasets¶
NOAA dataset¶
The dataset is hosted by NOAA's National Centers for Environmental Information (NCEI). It contains temperature anomaly data, representing deviations from a reference temperature over time.
Technical information
- Data is collected from 1850 - 2023.
- The data is in JSON format.
- Columns of interest: year and data (yearly anomaly).
| Field | Description |
|---|---|
DATE |
Period of time in years. |
DESCRIPTION |
Description of data set itself. |
DATA |
Anomaly in degrees Celsius. |
Data transformation required
- Read JSON.
- Drop the description column.
- Ensure every value within the date and data columns is numeric. The non-numeric values have to be drop.
- Make sure the index (date) is an integer type value.
- Rename data column to NOAAGlobalTemp and add the minimum and maximum years.
Link
License
Creative Commons Attribution 4.0 International license (CC-BY-4.0)
Berkley dataset¶
The dataset is associated with the Berkeley Earth project, an independent climate science organization, the dataset focuses on the "Annual Anomaly" column. Housed on Amazon S3, the data contains raw maximum temperature (TMAX) data, reflecting the highest recorded temperatures.
Technical information
- Data collected from 1850 - 2023.
- The dataset has missing values in 2023.
- The data is in TXT format.
- Columns of interest: year and annual anomaly (difference of temperature from a base reference).
| Field | Description |
|---|---|
YEAR |
Period of time in years. |
MONTH |
Period of time in months. |
MONTHLY ANOMALY |
Monthly anomaly in degrees celsius. |
ANNUAL ANOMALY |
Yearly anomaly in degrees celsius. |
FIVE YEAR ANOMALY |
5 year rolling average anomaly in degrees celsius. |
TEN YEAR ANOMALY |
10 year rolling average anomaly in degrees celsius. |
TWENTY YEAR ANOMALY |
20 year rolling average anomaly in degrees celsius. |
Data transformation required
- Read a space-delimited text file into a pandas DataFrame, ignoring lines that start with "%".
- Group the DataFrame by the 'year' column.
- Calculate the mean of the 'anomaly' column for each year.
- Reset the DataFrame index.
- Set the 'year' column as the new index.
- Convert the index values to integers.
Link
License
Creative Commons Attribution 4.0 International license (CC-BY-4.0)
CRUT5 dataset¶
The dataset is derived from the Hadley Centre for Climate Science and Services at the UK Met Office, suggesting tabular data, possibly containing global monthly climate information. The dataset includes time series summaries for global climate analysis, incorporating columns with both upper and lower confidence limits.
Technical information
- Data collected from 1850 - 2023.
- The data is in CSV format.
- Columns of interest: year and anomaly in degrees celsius (temperature anomaly).
| Field | Description |
|---|---|
YEAR_MONTH |
Period of time in "YYYY-MM" format. |
ANOMALY IN DEGREES CELSIUS |
Monthly anomaly in degrees celsius. |
LOWER CONFIDENCE LIMIT (2.5%) |
Numbers at the lower end of the confidence interval. |
UPPER CONFIDENCE LIMIT (97.5%) |
Numbers at the upper end of the confidence interval. |
Data transformation required
- Read a CSV file into a pandas DataFrame, parsing the 'Time' column as dates.
- Group the DataFrame by the year part of the 'Time' column.
- Calculate the mean of the 'Anomaly (deg C)' column for each year.
- Reset the DataFrame index.
- Set the 'Time' column as the new index.
- Convert the index values to integers.
Link
License
Open Government License (OGL) for Public Sector Information.
Météo France dataset¶
Meteo Ardennes provides localized weather data for the Ardennes region, encompassing essential parameters like temperature, precipitation, and wind speed. Sourced from regional weather stations and collaborative efforts with meteorological agencies, the data offers valuable insights for informed decision-making. Available in CSV.GZ format, these compressed files store tabular weather information efficiently. The dataset contains climatological data from all French and overseas stations since their opening, for all available parameters.
Technical information
- Daily data are available for download, by department and by period batch, in compressed csv format.
- All parameters are provided for all weather stations.
- Times are expressed in UTC for mainland France and in FU for overseas territories
- Files are updated annually for historical data prior to 1950, monthly for data from 1950 to year -2, and daily for the last two years.
| Field | Description |
|---|---|
NUM_POSTE |
Météo-France station number on 8 digits |
NOM_USUEL |
Common name of the station |
LAT |
Latitude, negative in the south (in degrees and millionths of a degree) |
LON |
Longitude, negative west of GREENWICH (in degrees and millionths of a degree) |
ALTI |
Altitude of the shelter base or rain gauge if no shelter (in meters) |
AAAAMMJJ |
Measurement date (year month day) |
RR |
Amount of precipitation fallen in 24 hours (from 06h UTC on day J to 06h UTC on day J+1). Value for day J is recorded at J+1 (in mm and tenths) |
TN |
Minimum temperature under shelter (in °C and tenths) |
HTN |
Time of TN (hhmm) |
TX |
Maximum temperature under shelter (in °C and tenths) |
HTX |
Time of TX (hhmm) |
TM |
Daily average of hourly temperatures under shelter (in °C and tenths) |
TNTXM |
Daily average (TN+TX)/2 (in °C and tenths) |
TAMPLI |
Daily thermal amplitude: difference between daily TX and TN (TX-TN) (in °C and tenths) |
TNSOL |
Daily minimum temperature 10 cm above ground (in °C and tenths) |
TN50 |
Daily minimum temperature 50 cm above ground (in °C and tenths) |
DG |
Duration of frost under shelter (T ≤ 0°C) (in minutes) |
FFM |
Daily average wind force averaged over 10 minutes, at 10 m (in m/s and tenths) |
FF2M |
Daily average wind force averaged over 10 minutes, at 2 m (in m/s and tenths) |
FXY |
Daily maximum of maximum hourly wind force averaged over 10 minutes, at 10 m (in m/s and tenths) |
DXY |
Direction of FXY (in compass points of 360) |
HXY |
Time of FXY (hhmm) |
FXI |
Daily maximum of maximum hourly instantaneous wind force, at 10 m (in m/s and tenths) |
DXI |
Direction of FXI (in compass points of 360) |
HXI |
Time of FXI (hhmm) |
FXI2 |
Daily maximum of maximum hourly instantaneous wind force, at 2 m (in m/s and tenths) |
DXI2 |
Direction of FXI2 (in compass points of 360) |
HXI2 |
Time of FXI2 (hhmm) |
FXI3S |
Daily maximum of maximum hourly wind force averaged over 3 seconds, at 10 m (in m/s and tenths) |
DXI3S |
Direction of FXI3S (in compass points of 360) |
HXI3S |
Time of FXI3S (hhmm) |
Quality codes associated with each data point (e.g., T;QT):
9: Filtered data (the data has passed first-level filters/controls)0: Protected data (the data has been definitively validated by the climatologist)1: Validated data (the data has been validated by automatic control or by the climatologist)2: Doubtful data under review (the data has been questioned by automatic control)
Link
- https://object.files.data.gouv.fr/meteofrance/data/synchro_ftp/BASE/QUOT/Q_08_1871-1949_RR-T-Vent.csv.gz
- https://object.files.data.gouv.fr/meteofrance/data/synchro_ftp/BASE/QUOT/Q_08_latest-2023-2024_autres-parametres.csv.gz
- https://object.files.data.gouv.fr/meteofrance/data/synchro_ftp/BASE/QUOT/Q_08_latest-2023-2024_RR-T-Vent.csv.gz
License
Etalab Open Licence 2.0.
1 2 3 4 5 | # Import libraries import pandas as pd import numpy as np from matplotlib import pyplot as plt import urllib.request |
C:\Users\baigs\AppData\Local\Temp\ipykernel_2792\1677011716.py:2: DeprecationWarning:
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
import pandas as pd
1 2 3 4 5 6 7 8 9 10 11 12 13 | # Save URL into NOAA_URL = "https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series/globe/land_ocean/1/12/1850-2023/data.json" BERKLEY_URL = "https://berkeley-earth-temperature.s3.us-west-1.amazonaws.com/Global/Raw_TMAX_complete.txt" HAD_CRUT5_URL = "https://www.metoffice.gov.uk/hadobs/hadcrut5/data/HadCRUT.5.0.2.0/analysis/diagnostics/HadCRUT.5.0.2.0.analysis.summary_series.global.monthly.csv" # Daily RR (rain) -T( temperature)-Vent(wind) data for department 08, over the period 1871-1949 ardennes_RR_T_wind_1871_1949_url = "https://object.files.data.gouv.fr/meteofrance/data/synchro_ftp/BASE/QUOT/Q_08_1871-1949_RR-T-Vent.csv.gz" # Daily RR (rain) -T( temperature)-Vent(wind) data for department 08, over the period 1950 - 2022 ardennes_RR_T_wind_1950_2022_url = "https://object.files.data.gouv.fr/meteofrance/data/synchro_ftp/BASE/QUOT/Q_08_latest-2023-2024_autres-parametres.csv.gz" # Daily RR (rain) -T( temperature)-Vent(wind) data for department 08, over the period # Daily RR (rain) -T( temperature)-Vent(wind) data for department 08, over the period 1950 - 2023 - 2024 ardennes_RR_T_wind_2023_2024_url = "https://object.files.data.gouv.fr/meteofrance/data/synchro_ftp/BASE/QUOT/Q_08_latest-2023-2024_RR-T-Vent.csv.gz" # field description : https://object.files.data.gouv.fr/meteofrance/data/synchro_ftp/BASE/QUOT/Q_descriptif_champs_RR-T-Vent.csv |
Function to get data if it doesn't exist¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | from pathlib import Path def save_data(urls, folder): subfolder = Path(f"./data/{folder}") subfolder.mkdir(parents=True, exist_ok=True) for key, url in urls.items(): website_url = url.split("/")[2] file_extension = url.split(".")[-1] filepath = subfolder / f"{key}.{file_extension}" if not filepath.exists(): urllib.request.urlretrieve(url, filepath) print(f"Data saved for {website_url} at {filepath}") else: print(f"Data already exists for {website_url} at {filepath}") |
Global Temperature data¶
1 2 3 | urls = {"noaa_df" : NOAA_URL, "berkley_df" : BERKLEY_URL, "had_crut5_df" : HAD_CRUT5_URL} # saving in a subfolder called global_temperature save_data(urls, "global_temperature") |
Data saved for www.ncei.noaa.gov at data\global_temperature\noaa_df.json Data saved for berkeley-earth-temperature.s3.us-west-1.amazonaws.com at data\global_temperature\berkley_df.txt Data saved for www.metoffice.gov.uk at data\global_temperature\had_crut5_df.csv
1 | subfolder = "./data/global_temperature/" |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # Read Json file df_noaa = pd.read_json("./data/global_temperature/noaa_df.json") # Dataframe drop unnecessary columns ... df_noaa = ( df_noaa .drop('description', axis=1) .loc[pd.to_numeric(df_noaa.index, errors='coerce').notna()] ) # Set index as int df_noaa.index = df_noaa.index.astype(int) # Rename columns with specific format df_noaa = df_noaa.rename(columns=lambda x: f"NOAAGlobalTemp ({df_noaa.index.min()} - {df_noaa.index.max()})") df_noaa.head() |
| NOAAGlobalTemp (1850 - 2023) | |
|---|---|
| 1850 | -0.06 |
| 1851 | -0.08 |
| 1852 | -0.01 |
| 1853 | -0.12 |
| 1854 | 0.02 |
1 2 3 4 5 6 7 8 9 10 11 | # read second dataset had_crut5_df = pd.read_csv('./data/global_temperature/had_crut5_df.csv', parse_dates=['Time']) # Group by year and calculate average per year. had_crut5_df = ( had_crut5_df .groupby(had_crut5_df['Time'].dt.year)['Anomaly (deg C)'].mean().reset_index() .set_index('Time') ) # Set index as int had_crut5_df.index = had_crut5_df.index.astype(int) had_crut5_df.head() |
| Anomaly (deg C) | |
|---|---|
| Time | |
| 1850 | -0.417711 |
| 1851 | -0.233350 |
| 1852 | -0.229399 |
| 1853 | -0.270354 |
| 1854 | -0.291521 |
1 2 3 4 5 6 7 8 9 10 11 | # Read third dataset berkley_df = pd.read_csv('./data/global_temperature/berkley_df.txt', comment="%", delim_whitespace=True, names= ["year", "month", "anomaly", "yearAvgAnomaly", "5yearAvgAnomaly", "10yearAvgAnomaly", "20yearAvgAnomaly"]) # Group by year and calculate average per year. berkley_df = ( berkley_df .groupby(berkley_df['year'])['anomaly'].mean().reset_index() .set_index('year') ) # berkley_df = berkley_df.assign(realTemp = berkley_df['anomaly'] + 14.40) berkley_df.index = berkley_df.index.astype(int) berkley_df |
C:\Users\baigs\AppData\Local\Temp\ipykernel_2792\1851333198.py:2: FutureWarning: The 'delim_whitespace' keyword in pd.read_csv is deprecated and will be removed in a future version. Use ``sep='\s+'`` instead
berkley_df = pd.read_csv('./data/global_temperature/berkley_df.txt', comment="%", delim_whitespace=True, names= ["year", "month", "anomaly", "yearAvgAnomaly", "5yearAvgAnomaly", "10yearAvgAnomaly", "20yearAvgAnomaly"])
| anomaly | |
|---|---|
| year | |
| 1850 | -1.141667 |
| 1851 | -0.971583 |
| 1852 | -1.007917 |
| 1853 | -0.382333 |
| 1854 | -0.170500 |
| ... | ... |
| 2019 | 1.199167 |
| 2020 | 1.391000 |
| 2021 | 1.160500 |
| 2022 | 1.205750 |
| 2023 | 1.610091 |
174 rows × 1 columns
1 2 3 4 5 6 7 8 9 10 11 12 13 | # Join the two datasets based on the common 'index' (year) merged_df = ( df_noaa .join(had_crut5_df.rename(columns={'Anomaly (deg C)': 'HadCRUT5_Anomaly'}), how='left') .join(berkley_df.rename(columns={'anomaly':'Berkley_anomaly'}), how='left') ) # Rename the new column as per your specified format merged_df = ( merged_df .rename(columns={'HadCRUT5_Anomaly': f"HAD_CRUT5 ({had_crut5_df.index.min()} - {had_crut5_df.index.max()})"}) .rename(columns={'Berkley_anomaly': f"BerkleyEarth ({berkley_df.index.min()} - {berkley_df.index.max()})"}) ) |
1 | merged_df
|
| NOAAGlobalTemp (1850 - 2023) | HAD_CRUT5 (1850 - 2023) | BerkleyEarth (1850 - 2023) | |
|---|---|---|---|
| 1850 | -0.06 | -0.417711 | -1.141667 |
| 1851 | -0.08 | -0.233350 | -0.971583 |
| 1852 | -0.01 | -0.229399 | -1.007917 |
| 1853 | -0.12 | -0.270354 | -0.382333 |
| 1854 | 0.02 | -0.291521 | -0.170500 |
| ... | ... | ... | ... |
| 2019 | 1.12 | 0.891073 | 1.199167 |
| 2020 | 0.83 | 0.922921 | 1.391000 |
| 2021 | 0.90 | 0.761906 | 1.160500 |
| 2022 | 0.83 | 0.801305 | 1.205750 |
| 2023 | 1.39 | 1.100057 | 1.610091 |
174 rows × 3 columns
1 | merged_df.plot(kind='line', figsize=(8, 4), title='Global Temperature change') |
<Axes: title={'center': 'Global Temperature change'}>
1 |
1 |
1 |