Data science of climate change¶
Data source¶
National Oceanic and Atmospheric Administration (NOAA)¶
NOAA is an agency of the U.S. department of commerce that “enriches life through science”. They work to keep the public informed of the changing environment around them. In 1972, they've created The Global Monitoring Laboratory (GML), as part of the NOAA Environmental Research Laboratories. It's mission is focused on geophysical monitoring for climatic change.
GML conducts research that addresses three major challenges: greenhouse gas and carbon cycle feedbacks, changes in clouds, aerosols, and surface radiation, and recovery of stratospheric ozone. One of four programs created by GML is the Carbon Cycle Greenhouse Gases (CCGG), in which the spatial and temporal distributions of greenhouse gases are measured and modeled.
The CCGG research area operates the Global Greenhouse Gas Reference Network, measuring the atmospheric distribution and trends of the three main long-term drivers of climate change, carbon dioxide (CO2), methane (CH4), and nitrous oxide (N2O), as well as carbon monoxide (CO) which is an important indicator of air pollution.
CO2 measurements in the 70s¶
NOAA Carbon Dioxide (CO2) measurements first began in 1967 at the Niwot Ridge high altitude research station, Colorado. Since then, the CCGG cooperative air sampling network measures CO2 from more than 50 sites world wide. In-Situ continuous measurements of CO2 at the NOAA GML Atmospheric Baseline Observatories began in 1973. Additional in-situ measurements from tall towers and light aircraft in the USA began in the 1990's, to support studies on continental sources and sinks of greenhouse gases. Vertical profile data from the AirCore Sampling System began in 2012 to monitor greenhouse gases at high altitudes and to help evaluate remote sensing measurements.
Flask (1968-2022)¶
GML measures greenhouse gases through a cooperative air sampling network effort which began in 1967. Air samples are collected approximately weekly with flasks from a globally distributed network of sites, ranging from being collected by volunteers to being collected by aircrafts.
Measurement data are used to identify long-term trends, seasonal variability, and spatial distribution of carbon cycle gases.
This data can be found in GML's CO2 database.
In situ (1973-2022)¶
Greenhouse gases are also measured at four Atmospheric Baseline Observatories and multiple tall towers in the United States. It first began measurements of CO2 at the observatories in 1973. Continuous in-situ measurements of these gases provides great detail in their long term trends, seasonal and short-term variations, and diurnal cycles.
This data can also be found in GML's CO2 database.
CO2 estimates before real measurements, paleoclimatology¶
The most direct method of investigating past variations of the atmospheric CO2 concentration before 1958, when continuous direct atmospheric CO2 measurements started, is the analysis of air extracted from suitable ice cores (Siegenthaler et al., 2005)
The measurement of the gas composition is direct: trapped in deep ice cores are tiny bubbles of ancient air, which we can extract and analyze using mass spectrometers (https://www.scientificamerican.com/article/how-are-past-temperatures/).
From 1000 to 2004¶
In 2010, pre-industrial CO2 data was estimated using two ice cores. Measurements for Law Dome (LD) and Dronning Maud Land (DML) were sufficient to yield smoothed, with 50-year splines, estimates for CO2 evolution between 1000 and 2004 (Frank et al., 2010).
This data can be found in NOAA's Paleoclimatology database, as study 10437.
800,000 years before present¶
In 2008, two years prior, the Antarctic Vostok and EPICA Dome C ice cores had provided a composite record of atmospheric carbon dioxide levels over the past 800,000 years (Lüthi et al., 2008).
This data can also be found in NOAA's Paleoclimatology database, as study 6091.
Unit of measurement¶
When measuring gases the term concentration is used to describe the amount of gas by volume in the air. The two most common units of measurement are parts-per-million, and percent concentration. All of our data uses parts-per-million as their unit of measurement.
Parts-per-million (abbreviated ppm) is the ratio of one gas to another. For example, 1,000 ppm of CO2 means that if you could count a million gas molecules, 1,000 of them would be of carbon dioxide and 999,000 molecules would be some other gases (https://www.co2meter.com/fr-fr/blogs/news/15164297-co2-gas-concentration-defined).
Data analysis¶
1 2 3 4 5 6 | import matplotlib.pyplot as plt import pandas as pd from pathlib import Path from urllib.request import urlretrieve import urllib.parse from scipy import stats |
In situ (1973-2022)¶
The dataset is stored in a text format, uses spaces as the separator and contains 20 columns for 593 lines. It contains a site code, which refers to the observatory where the data was recorded, the date, from the year down to the second, the CO2 record and other attributes of the measurements. Therefore, we have four different dataset, each specific to an observatory :
| Barrow | Mauna Loa | American Samoa | South Pole |
|---|---|---|---|
| BRW | MLO | SMO | SPO |
1 2 3 4 5 6 | observatories = {"brw": "Barrow", "mlo": "Mauna Loa", "smo": "American Samoa", "spo": "South Pole"} for obs_name in observatories.keys(): filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt" url = f"https://gml.noaa.gov/aftp/data/trace_gases/co2/in-situ/surface/txt/{filename}" if not Path(filename).is_file(): urlretrieve(url, filename) |
- By default, missing values are displayed as -999,99.
# value:_FillValue : -999.99
To prevent these values from dragging the graph down, they will be replaced by Na when we import the datasets.
- In order to smooth the CO2 records, we also decided to add a rolling average of eleven adjacent seasonal cycles, centered on the month to be corrected. This number was chosen to center the average value over the year, erasing the seasonal variability. At any point in time, the CO2 recorded is adjusted by the values of the nearest four-season cycle.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | def read_noaa_data(filename): """Reads a dataframe from a NOAA file and adds a rolling mean to the value. Parameters ---------- filename : string A dataframe in a text file from NOAA. Returns ------- dataframe : The dataframe. """ df = pd.read_csv(filename, comment='#', sep=' ', header=0, na_values = -999.99) df["rolling_mean"] = df['value'].rolling(11, center=True).mean() return df filename = "co2_mlo_surface-insitu_1_ccgg_MonthlyData.txt" df_noaa = read_noaa_data(filename) df_noaa.head(15) |
| site_code | year | month | day | hour | minute | second | datetime | time_decimal | midpoint_time | value | value_std_dev | nvalue | latitude | longitude | altitude | elevation | intake_height | qcflag | rolling_mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | MLO | 1973 | 8 | 1 | 0 | 0 | 0 | 1973-08-01T00:00:00Z | 1973.580822 | 114325200 | NaN | -99.99 | 0 | 19.536 | -155.576 | 3397.0 | 3397.0 | 0.0 | *.. | NaN |
| 1 | MLO | 1973 | 9 | 1 | 0 | 0 | 0 | 1973-09-01T00:00:00Z | 1973.665753 | 117003600 | NaN | -99.99 | 0 | 19.536 | -155.576 | 3397.0 | 3397.0 | 0.0 | *.. | NaN |
| 2 | MLO | 1973 | 10 | 1 | 0 | 0 | 0 | 1973-10-01T00:00:00Z | 1973.747945 | 119595600 | NaN | -99.99 | 0 | 19.536 | -155.576 | 3397.0 | 3397.0 | 0.0 | *.. | NaN |
| 3 | MLO | 1973 | 11 | 1 | 0 | 0 | 0 | 1973-11-01T00:00:00Z | 1973.832877 | 122274000 | NaN | -99.99 | 0 | 19.536 | -155.576 | 3397.0 | 3397.0 | 0.0 | *.. | NaN |
| 4 | MLO | 1973 | 12 | 1 | 0 | 0 | 0 | 1973-12-01T00:00:00Z | 1973.915068 | 124866000 | NaN | -99.99 | 0 | 19.536 | -155.576 | 3397.0 | 3397.0 | 0.0 | *.. | NaN |
| 5 | MLO | 1974 | 1 | 1 | 0 | 0 | 0 | 1974-01-01T00:00:00Z | 1974.000000 | 127544400 | NaN | -99.99 | 0 | 19.536 | -155.576 | 3387.1 | 3397.0 | -9.9 | *.. | NaN |
| 6 | MLO | 1974 | 2 | 1 | 0 | 0 | 0 | 1974-02-01T00:00:00Z | 1974.084932 | 130222800 | NaN | -99.99 | 0 | 19.536 | -155.576 | 3387.1 | 3397.0 | -9.9 | *.. | NaN |
| 7 | MLO | 1974 | 3 | 1 | 0 | 0 | 0 | 1974-03-01T00:00:00Z | 1974.161644 | 132642000 | NaN | -99.99 | 0 | 19.536 | -155.576 | 3387.1 | 3397.0 | -9.9 | *.. | NaN |
| 8 | MLO | 1974 | 4 | 1 | 0 | 0 | 0 | 1974-04-01T00:00:00Z | 1974.246575 | 135320400 | NaN | -99.99 | 0 | 19.536 | -155.576 | 3387.1 | 3397.0 | -9.9 | *.. | NaN |
| 9 | MLO | 1974 | 5 | 1 | 0 | 0 | 0 | 1974-05-01T00:00:00Z | 1974.328767 | 137912400 | 333.16 | 0.36 | 13 | 19.536 | -155.576 | 3422.0 | 3397.0 | 25.0 | ... | NaN |
| 10 | MLO | 1974 | 6 | 1 | 0 | 0 | 0 | 1974-06-01T00:00:00Z | 1974.413699 | 140590800 | 332.17 | 0.41 | 25 | 19.536 | -155.576 | 3422.0 | 3397.0 | 25.0 | ... | NaN |
| 11 | MLO | 1974 | 7 | 1 | 0 | 0 | 0 | 1974-07-01T00:00:00Z | 1974.495890 | 143182800 | 331.11 | 0.49 | 24 | 19.536 | -155.576 | 3422.0 | 3397.0 | 25.0 | ... | NaN |
| 12 | MLO | 1974 | 8 | 1 | 0 | 0 | 0 | 1974-08-01T00:00:00Z | 1974.580822 | 145861200 | 329.11 | 0.64 | 26 | 19.536 | -155.576 | 3422.0 | 3397.0 | 25.0 | ... | NaN |
| 13 | MLO | 1974 | 9 | 1 | 0 | 0 | 0 | 1974-09-01T00:00:00Z | 1974.665753 | 148539600 | 327.30 | 0.64 | 22 | 19.536 | -155.576 | 3422.0 | 3397.0 | 25.0 | ... | NaN |
| 14 | MLO | 1974 | 10 | 1 | 0 | 0 | 0 | 1974-10-01T00:00:00Z | 1974.747945 | 151131600 | 327.30 | 0.27 | 24 | 19.536 | -155.576 | 3422.0 | 3397.0 | 25.0 | ... | 330.195455 |
1 | df_noaa.describe() |
| year | month | day | hour | minute | second | time_decimal | midpoint_time | value | value_std_dev | nvalue | latitude | longitude | altitude | elevation | intake_height | rolling_mean | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 593.000000 | 593.000000 | 593.0 | 593.0 | 593.0 | 593.0 | 593.000000 | 5.930000e+02 | 582.000000 | 593.00000 | 593.000000 | 5.930000e+02 | 5.930000e+02 | 593.000000 | 593.0 | 593.000000 | 562.000000 |
| mean | 1997.789207 | 6.529511 | 1.0 | 0.0 | 0.0 | 0.0 | 1998.248087 | 8.927452e+08 | 369.691718 | -1.23145 | 25.160202 | 1.953600e+01 | -1.555760e+02 | 3430.063406 | 3397.0 | 33.063406 | 370.289376 |
| std | 14.278755 | 3.457692 | 0.0 | 0.0 | 0.0 | 0.0 | 14.277380 | 4.505603e+08 | 25.716157 | 13.59032 | 5.347673 | 5.333570e-14 | 6.826969e-13 | 11.486558 | 0.0 | 11.486558 | 24.883814 |
| min | 1973.000000 | 1.000000 | 1.0 | 0.0 | 0.0 | 0.0 | 1973.580822 | 1.143252e+08 | 327.300000 | -99.99000 | 0.000000 | 1.953600e+01 | -1.555760e+02 | 3387.100000 | 3397.0 | -9.900000 | 330.191818 |
| 25% | 1985.000000 | 4.000000 | 1.0 | 0.0 | 0.0 | 0.0 | 1985.915068 | 5.035572e+08 | 348.110000 | 0.47000 | 24.000000 | 1.953600e+01 | -1.555760e+02 | 3422.000000 | 3397.0 | 25.000000 | 349.289091 |
| 50% | 1998.000000 | 7.000000 | 1.0 | 0.0 | 0.0 | 0.0 | 1998.246575 | 8.927028e+08 | 366.815000 | 0.60000 | 26.000000 | 1.953600e+01 | -1.555760e+02 | 3437.000000 | 3397.0 | 40.000000 | 368.157273 |
| 75% | 2010.000000 | 10.000000 | 1.0 | 0.0 | 0.0 | 0.0 | 2010.580822 | 1.281935e+09 | 390.397500 | 0.75000 | 28.000000 | 1.953600e+01 | -1.555760e+02 | 3437.000000 | 3397.0 | 40.000000 | 390.379773 |
| max | 2022.000000 | 12.000000 | 1.0 | 0.0 | 0.0 | 0.0 | 2022.915068 | 1.671167e+09 | 420.970000 | 1.40000 | 31.000000 | 1.953600e+01 | -1.555760e+02 | 3437.000000 | 3397.0 | 40.000000 | 418.484545 |
With such datasets, we can observe CO2 records from 1973 up until now for all four observatories.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | observatories = {"brw": "Barrow", "mlo": "Mauna Loa", "smo": "American Samoa", "spo": "South Pole"} fig, axs = plt.subplots(4, 2) axes = [[0, 0], [0, 1], [1, 0], [1, 1], [2, 0], [2, 1], [3, 0], [3, 1]] fig.suptitle("Monthly atmospheric CO2 record") cpt = 0 for obs_name in observatories.keys() : # Importing the data filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt" df_noaa = read_noaa_data(filename) # First plot, since the beginning axs[axes[cpt][0], axes[cpt][1]].plot(df_noaa["time_decimal"], df_noaa["value"], '-o', color="red", markersize = 1) axs[axes[cpt][0], axes[cpt][1]].plot(df_noaa["time_decimal"], df_noaa["rolling_mean"], color="black" ) # Second plot, over the last five years start_year = df_noaa["year"][df_noaa.year.argmax()] - 5 axs[axes[cpt+1][0], axes[cpt+1][1]].plot(df_noaa[df_noaa["year"] >= start_year]["time_decimal"], df_noaa[df_noaa["year"] >= start_year]["value"], '-o', color="red", markersize = 3, label = "Real value") axs[axes[cpt+1][0], axes[cpt+1][1]].plot(df_noaa[df_noaa["year"] >= start_year]["time_decimal"], df_noaa[df_noaa["year"] >= start_year]["rolling_mean"], color="black") # Displays the name of the observatory for which the records are shown obs_name = observatories[df_noaa["site_code"][0].lower()] axs[axes[cpt][0], axes[cpt][1]].set(ylabel=obs_name) axs[axes[cpt][0], axes[cpt][1]].yaxis.get_label().set_fontsize(8) cpt+=2 # Hide x labels and tick labels for top plots and y ticks for right plots. for ax in axs.flat: ax.label_outer() fig.supxlabel('Year') fig.supylabel('CO2 (ppm)') fig.tight_layout() plt.show() |
Each line represents an observatory, with its name written to the left. Columns show the full period of recordings on the left and the last 5 years on the right. The real value is displayed in red, whereas the rolling mean calculated earlier is shown in black.
If we zoom on the last five years for each observatory, we can see why the rolling average was necessary.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | colors = ["#9d0208", "#dc2f02", "#f48c06", "#ffba08"] cpt=0 fig, (ax1, ax2) = plt.subplots(1, 2, layout='constrained', sharey=True, sharex=True) for obs_name in observatories.keys() : filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt" df_noaa = read_noaa_data(filename) obs_name = observatories[df_noaa["site_code"][0].lower()] start_year = df_noaa["year"][df_noaa.year.argmax()] - 5 ax1.plot(df_noaa[df_noaa["year"] >= start_year]["time_decimal"], df_noaa[df_noaa["year"] >= start_year]["value"], '-o', markersize=4, color = colors[cpt], label = obs_name) ax1.set_title("Monthly CO2 record") ax1.legend(loc="upper left") ax2.plot(df_noaa[df_noaa["year"] >= start_year]["time_decimal"], df_noaa[df_noaa["year"] >= start_year]["rolling_mean"], color = colors[cpt], label = obs_name) ax2.set_title("Averaged CO2 record") ax2.legend(loc="upper left") cpt+=1 fig.supxlabel('Year') fig.supylabel('CO2 (ppm)') plt.show() |
If we focus on the raw value, they are all very different depending on the observatory. However if we focus on the averaged value, we can assume they all increase in the same way. To make sure of this, we will calculate the slope of a linear least-squares regression fitted to each observatory, and compare the values obtained.
1 2 3 4 5 6 7 8 9 10 | cpt=0 for obs_name in observatories.keys() : filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt" df_noaa = read_noaa_data(filename) obs_name = observatories[df_noaa["site_code"][0].lower()] df_noaa = df_noaa.dropna() res = stats.linregress(df_noaa["time_decimal"], df_noaa["rolling_mean"]) plt.plot(df_noaa["time_decimal"], res.intercept + res.slope*df_noaa["time_decimal"], color=colors[cpt], label=f"{obs_name}, a: {res.slope:.4f}") plt.legend(loc="upper left") cpt+=1 |
We can conclude that although the CO2 recorded seems vastly different depending on the observatory, it rises in the same way no matter where it's measured.
1 2 3 4 | import numpy as np def exp_model(x, a, b): y = np.exp(a) * (np.exp(b) ** x) return y |
Exponential Regression in Python method found here.
1 2 3 4 5 6 7 8 9 10 | cpt=0 for obs_name in observatories.keys() : filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt" df_noaa = read_noaa_data(filename) obs_name = observatories[df_noaa["site_code"][0].lower()] df_noaa = df_noaa.dropna() fit = np.polyfit(df_noaa["time_decimal"], np.log(df_noaa["rolling_mean"]), 1) plt.plot(df_noaa["time_decimal"], exp_model(df_noaa["time_decimal"], fit[1], fit[0]), color=colors[cpt], label=f"{obs_name}, a: {fit[1]:.4f}, b: {fit[0]:.4f}") plt.legend(loc="upper left") cpt+=1 |
Paleoclimatology¶
From the year 1000 to 2004¶
The dataset is also stored in a text format, but uses tabulations as the separator and contains 23 columns for 1005 lines. It contains the year of recording, from 1000 to 2004, and three mains values : the record at Law Dome (LD), the record at Dronning Maud Land (DML) and the average record over thoses two sites. Each of theses values is smoothed with splines ranging from 50 to 200 years with a 25 year step. We will only take into account, the years and the value for both sites with a 50 year splines, where data is missing the least.
1 2 3 4 5 6 7 8 9 | filename = "smoothedco2.txt" if not Path(filename).is_file(): urlretrieve( "https://www.ncei.noaa.gov/pub/data/paleo/contributions_by_author/frank2010/smoothedco2.txt", filename, ) df_indus = pd.read_csv(filename, sep='\t', header=0) df_indus.head() |
| Year | ALL_50_full | LD_050 | DML_050 | ALL_050 | LD_075 | DML_075 | ALL_075 | LD_100 | DML_100 | ... | ALL_125 | LD_150 | DML_150 | ALL_150 | LD_175 | DML_175 | ALL_175 | LD_200 | DML_200 | ALL_200 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1000 | 278.66 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 1001 | 278.68 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 1002 | 278.69 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 1003 | 278.71 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 1004 | 278.72 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 23 columns
1 | df_indus.describe() |
| Year | ALL_50_full | LD_050 | DML_050 | ALL_050 | LD_075 | DML_075 | ALL_075 | LD_100 | DML_100 | ... | ALL_125 | LD_150 | DML_150 | ALL_150 | LD_175 | DML_175 | ALL_175 | LD_200 | DML_200 | ALL_200 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 1005.000000 | 1005.000000 | 751.000000 | 751.000000 | 751.000000 | 751.000000 | 751.000000 | 751.000000 | 751.000000 | 751.000000 | ... | 751.000000 | 751.000000 | 751.000000 | 751.000000 | 751.000000 | 751.000000 | 751.000000 | 751.000000 | 751.000000 | 751.000000 |
| mean | 1502.000000 | 284.887801 | 280.554288 | 280.167523 | 280.053888 | 280.537457 | 280.164940 | 280.055286 | 280.493475 | 280.158362 | ... | 280.028415 | 280.373249 | 280.166059 | 280.006871 | 280.302783 | 280.179867 | 279.975979 | 280.228282 | 280.197137 | 279.925846 |
| std | 290.262812 | 14.310892 | 2.657925 | 1.407282 | 1.844578 | 2.576056 | 1.387695 | 1.800978 | 2.486121 | 1.375995 | ... | 1.744265 | 2.326101 | 1.328713 | 1.714550 | 2.290978 | 1.287739 | 1.689981 | 2.286184 | 1.239137 | 1.692015 |
| min | 1000.000000 | 276.450000 | 274.610000 | 277.480000 | 276.450000 | 275.470000 | 277.640000 | 276.890000 | 275.760000 | 277.600000 | ... | 276.700000 | 275.930000 | 277.670000 | 276.690000 | 275.920000 | 277.820000 | 276.710000 | 275.890000 | 277.990000 | 276.710000 |
| 25% | 1251.000000 | 279.600000 | 278.440000 | 279.085000 | 278.560000 | 278.580000 | 279.105000 | 278.635000 | 278.670000 | 279.170000 | ... | 278.755000 | 278.605000 | 279.200000 | 278.755000 | 278.510000 | 279.190000 | 278.685000 | 278.375000 | 279.175000 | 278.540000 |
| 50% | 1502.000000 | 280.980000 | 281.460000 | 280.460000 | 280.620000 | 281.380000 | 280.310000 | 280.580000 | 281.230000 | 280.260000 | ... | 280.510000 | 281.020000 | 280.370000 | 280.480000 | 280.950000 | 280.390000 | 280.450000 | 280.940000 | 280.410000 | 280.460000 |
| 75% | 1753.000000 | 282.100000 | 282.585000 | 281.315000 | 281.360000 | 282.495000 | 281.180000 | 281.330000 | 282.170000 | 281.235000 | ... | 281.315000 | 282.175000 | 281.285000 | 281.305000 | 282.160000 | 281.295000 | 281.295000 | 282.145000 | 281.310000 | 281.280000 |
| max | 2004.000000 | 372.930000 | 284.000000 | 282.410000 | 282.630000 | 283.850000 | 282.150000 | 282.450000 | 283.690000 | 282.040000 | ... | 282.230000 | 283.280000 | 281.950000 | 282.130000 | 283.060000 | 281.890000 | 282.030000 | 282.860000 | 281.830000 | 281.930000 |
8 rows × 23 columns
We can display the CO2 estimated from 1000 to 2004.
1 2 3 4 5 6 7 8 9 10 | plt.plot(df_indus["Year"], df_indus["ALL_50_full"], color="lightblue", label = "Ice core" ) plt.title("Global atmospheric CO2 since 1000") plt.xlabel("Year") plt.ylabel("CO2 (ppm)") plt.show() |
Theses values only going up to 2004, but we'd like to track the current CO2 records. Therefore, we will manually add the averaged values recorded at Mauna Loa (calculated same as before) since 2004.
Values range from 1000 to 2004, but we'd like it to go up to now. We'll use the values observed at Mauna Loa.
1 2 3 4 | filename = "co2_mlo_surface-insitu_1_ccgg_MonthlyData.txt" df_mlo = read_noaa_data(filename) df_mlo = df_mlo[["year", "rolling_mean"]] df_mlo.head() |
| year | rolling_mean | |
|---|---|---|
| 0 | 1973 | NaN |
| 1 | 1973 | NaN |
| 2 | 1973 | NaN |
| 3 | 1973 | NaN |
| 4 | 1973 | NaN |
1 2 3 4 | df_mlo = df_mlo.groupby(["year"])[["rolling_mean"]].mean() df_mlo = df_mlo.rename_axis("year").reset_index() df_mlo = df_mlo.rename(columns={"year": "Year", "rolling_mean": "ALL_50_full"}) df_mlo.head() |
| Year | ALL_50_full | |
|---|---|---|
| 0 | 1973 | NaN |
| 1 | 1974 | 330.246970 |
| 2 | 1975 | 331.001364 |
| 3 | 1976 | 332.148831 |
| 4 | 1977 | 333.823409 |
1 2 | df_indus_new = pd.concat([df_indus, df_mlo[df_mlo["Year"] > 2004]], axis=0) df_indus_new.tail() |
| Year | ALL_50_full | LD_050 | DML_050 | ALL_050 | LD_075 | DML_075 | ALL_075 | LD_100 | DML_100 | ... | ALL_125 | LD_150 | DML_150 | ALL_150 | LD_175 | DML_175 | ALL_175 | LD_200 | DML_200 | ALL_200 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 45 | 2018 | 408.782197 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 46 | 2019 | 411.648636 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 47 | 2020 | 414.177424 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 48 | 2021 | 416.408636 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 49 | 2022 | 418.103485 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 23 columns
We can now see atmospheric CO2 records from 1000 to now. This way, we can also focus on atmospheric CO2 records after the pre-industrial era, around 1750.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | fig, (ax1, ax2) = plt.subplots(1, 2, sharey=True) ax1.plot(df_indus["Year"], df_indus["ALL_50_full"], color="lightblue") ax1.set_title("Since 1000") ax2.plot(df_indus[df_indus["Year"] >= 1750]["Year"], df_indus[df_indus["Year"] >= 1750]["ALL_50_full"], color="lightblue") ax2.set_title("Since 1750") fig.suptitle("Global atmospheric CO2") fig.supxlabel('Year') fig.supylabel('CO2 (ppm)') plt.show() |
800,000 years before present¶
The dataset is once again stored in a text format, also uses tabulations as the separator and contains 2 columns for 1096 lines. It contains the year before present, meaning before 1950, and the CO2 records. This means that for instance, the first year being 137 BP is actually the year 1813. This has to changed in order for the plot to be displayed correctly.
1 2 3 4 5 6 7 8 9 | filename = "edc3-composite-co2-2008-noaa.txt" if not Path(filename).is_file(): urlretrieve( "https://www.ncei.noaa.gov/pub/data/paleo/icecore/antarctica/epica_domec/edc3-composite-co2-2008-noaa.txt", filename, ) df_icecore = pd.read_csv(filename, comment='#', sep='\t', header=0) df_icecore.head() |
| gas_ageBP | CO2 | |
|---|---|---|
| 0 | 137 | 280.4 |
| 1 | 268 | 274.9 |
| 2 | 279 | 277.9 |
| 3 | 395 | 279.1 |
| 4 | 404 | 281.9 |
1 | df_icecore.describe() |
| gas_ageBP | CO2 | |
|---|---|---|
| count | 1096.000000 | 1096.000000 |
| mean | 390905.979015 | 230.835675 |
| std | 262092.947239 | 27.573616 |
| min | 137.000000 | 171.600000 |
| 25% | 137133.500000 | 207.500000 |
| 50% | 423206.500000 | 231.450000 |
| 75% | 627408.000000 | 251.525000 |
| max | 798512.000000 | 298.600000 |
1 2 | df_icecore["gas_ageBP"] = 1950 - df_icecore["gas_ageBP"] df_icecore.head() |
| gas_ageBP | CO2 | |
|---|---|---|
| 0 | 1813 | 280.4 |
| 1 | 1682 | 274.9 |
| 2 | 1671 | 277.9 |
| 3 | 1555 | 279.1 |
| 4 | 1546 | 281.9 |
Like before, we'd like to look at the values up until now. We will manually add the previous dataframe (from 1000 to now) to our current one.
1 2 3 | df_current = df_indus_new.rename(columns={"Year": "gas_ageBP", "ALL_50_full": "CO2"}) df_full = pd.concat([df_icecore, df_current.iloc[:, :2]], axis=0) df_full.tail() |
| gas_ageBP | CO2 | |
|---|---|---|
| 45 | 2018 | 408.782197 |
| 46 | 2019 | 411.648636 |
| 47 | 2020 | 414.177424 |
| 48 | 2021 | 416.408636 |
| 49 | 2022 | 418.103485 |
We also want to order the years chronologically. This will be sorting by ordering the dataset based on the year.
1 2 | df_full = df_full.sort_values(by=["gas_ageBP"]) df_full.head() |
| gas_ageBP | CO2 | |
|---|---|---|
| 1095 | -796562 | 191.0 |
| 1094 | -795149 | 188.4 |
| 1093 | -794517 | 189.3 |
| 1092 | -793252 | 195.2 |
| 1091 | -792658 | 199.4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 | plt.plot(df_full["gas_ageBP"] / 1000, df_full["CO2"], color="brown" ) plt.xlim([-800, 5]) plt.title("CO2 record for the last 800 000 year") plt.xlabel("Year (10**3)") plt.ylabel("CO2 (ppm)") plt.axhline(300, linestyle='--', color = "grey", label ="Highest historical CO2 level") plt.axhline(float(df_full[df_full["gas_ageBP"] == 1950]["CO2"]) , linestyle='--', color = "red", label ="1950") plt.axhline(float(df_full[df_full["gas_ageBP"] == df_full["gas_ageBP"].max()]["CO2"]), linestyle='--', color = "orange", label ="current") plt.legend(loc='upper center') plt.show() |
Zooming on CO2 records after 1750, we can see how the highest historical CO2 record was surpassed soon after 1900. After 1950, it rose even moreso than before, and keeps on rising.
1 2 3 4 5 6 7 8 9 10 11 | plt.plot(df_full[df_full["gas_ageBP"] >= 1750]["gas_ageBP"], df_full[df_full["gas_ageBP"] >= 1750]["CO2"], color="brown") plt.title("CO2 record since 1750") plt.xlabel("Year") plt.ylabel("CO2 (ppm)") plt.axhline(300, linestyle='--', color = "grey", label ="Highest historical CO2 level") plt.axhline(float(df_full[df_full["gas_ageBP"] == 1950]["CO2"]) , linestyle='--', color = "red", label ="1950") plt.axhline(float(df_full[df_full["gas_ageBP"] == df_full["gas_ageBP"].max()]["CO2"]), linestyle='--', color = "orange", label ="current") plt.legend(loc='upper center') plt.show() |
1 |