Data science of climate change¶

Data source¶

National Oceanic and Atmospheric Administration (NOAA)¶

NOAA is an agency of the U.S. department of commerce that “enriches life through science”. They work to keep the public informed of the changing environment around them. In 1972, they've created The Global Monitoring Laboratory (GML), as part of the NOAA Environmental Research Laboratories. It's mission is focused on geophysical monitoring for climatic change.

GML conducts research that addresses three major challenges: greenhouse gas and carbon cycle feedbacks, changes in clouds, aerosols, and surface radiation, and recovery of stratospheric ozone. One of four programs created by GML is the Carbon Cycle Greenhouse Gases (CCGG), in which the spatial and temporal distributions of greenhouse gases are measured and modeled.

The CCGG research area operates the Global Greenhouse Gas Reference Network, measuring the atmospheric distribution and trends of the three main long-term drivers of climate change, carbon dioxide (CO2), methane (CH4), and nitrous oxide (N2O), as well as carbon monoxide (CO) which is an important indicator of air pollution.

CO2 measurements in the 70s¶

NOAA Carbon Dioxide (CO2) measurements first began in 1967 at the Niwot Ridge high altitude research station, Colorado. Since then, the CCGG cooperative air sampling network measures CO2 from more than 50 sites world wide. In-Situ continuous measurements of CO2 at the NOAA GML Atmospheric Baseline Observatories began in 1973. Additional in-situ measurements from tall towers and light aircraft in the USA began in the 1990's, to support studies on continental sources and sinks of greenhouse gases. Vertical profile data from the AirCore Sampling System began in 2012 to monitor greenhouse gases at high altitudes and to help evaluate remote sensing measurements.

Flask (1968-2022)¶

GML measures greenhouse gases through a cooperative air sampling network effort which began in 1967. Air samples are collected approximately weekly with flasks from a globally distributed network of sites, ranging from being collected by volunteers to being collected by aircrafts.
Measurement data are used to identify long-term trends, seasonal variability, and spatial distribution of carbon cycle gases.

This data can be found in GML's CO2 database.

In situ (1973-2022)¶

Greenhouse gases are also measured at four Atmospheric Baseline Observatories and multiple tall towers in the United States. It first began measurements of CO2 at the observatories in 1973. Continuous in-situ measurements of these gases provides great detail in their long term trends, seasonal and short-term variations, and diurnal cycles.

This data can also be found in GML's CO2 database.

CO2 estimates before real measurements, paleoclimatology¶

The most direct method of investigating past variations of the atmospheric CO2 concentration before 1958, when continuous direct atmospheric CO2 measurements started, is the analysis of air extracted from suitable ice cores (Siegenthaler et al., 2005)

The measurement of the gas composition is direct: trapped in deep ice cores are tiny bubbles of ancient air, which we can extract and analyze using mass spectrometers (https://www.scientificamerican.com/article/how-are-past-temperatures/).

From 1000 to 2004¶

In 2010, pre-industrial CO2 data was estimated using two ice cores. Measurements for Law Dome (LD) and Dronning Maud Land (DML) were sufficient to yield smoothed, with 50-year splines, estimates for CO2 evolution between 1000 and 2004 (Frank et al., 2010).

This data can be found in NOAA's Paleoclimatology database, as study 10437.

800,000 years before present¶

In 2008, two years prior, the Antarctic Vostok and EPICA Dome C ice cores had provided a composite record of atmospheric carbon dioxide levels over the past 800,000 years (Lüthi et al., 2008).

This data can also be found in NOAA's Paleoclimatology database, as study 6091.

Unit of measurement¶

When measuring gases the term concentration is used to describe the amount of gas by volume in the air. The two most common units of measurement are parts-per-million, and percent concentration. All of our data uses parts-per-million as their unit of measurement.
Parts-per-million (abbreviated ppm) is the ratio of one gas to another. For example, 1,000 ppm of CO2 means that if you could count a million gas molecules, 1,000 of them would be of carbon dioxide and 999,000 molecules would be some other gases (https://www.co2meter.com/fr-fr/blogs/news/15164297-co2-gas-concentration-defined).

Data analysis¶

import matplotlib.pyplot as plt
import pandas as pd
from pathlib import Path
from urllib.request import urlretrieve
import urllib.parse
from scipy import stats

In situ (1973-2022)¶

The dataset is stored in a text format, uses spaces as the separator and contains 20 columns for 593 lines. It contains a site code, which refers to the observatory where the data was recorded, the date, from the year down to the second, the CO2 record and other attributes of the measurements. Therefore, we have four different dataset, each specific to an observatory :

Barrow	Mauna Loa	American Samoa	South Pole
BRW	MLO	SMO	SPO

observatories = {"brw": "Barrow", "mlo": "Mauna Loa", "smo": "American Samoa", "spo": "South Pole"}
for obs_name in observatories.keys():
    filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt"
    url = f"https://gml.noaa.gov/aftp/data/trace_gases/co2/in-situ/surface/txt/{filename}"
    if not Path(filename).is_file():
        urlretrieve(url, filename)

By default, missing values are displayed as -999,99.

# value:_FillValue : -999.99

To prevent these values from dragging the graph down, they will be replaced by Na when we import the datasets.

In order to smooth the CO2 records, we also decided to add a rolling average of eleven adjacent seasonal cycles, centered on the month to be corrected. This number was chosen to center the average value over the year, erasing the seasonal variability. At any point in time, the CO2 recorded is adjusted by the values of the nearest four-season cycle.

def read_noaa_data(filename):
    """Reads a dataframe from a NOAA file and adds a rolling mean to the value.

    Parameters
    ----------
    filename : string
        A dataframe in a text file from NOAA.

    Returns
    -------
    dataframe :
        The dataframe.
    """
    df = pd.read_csv(filename, comment='#', sep=' ', header=0, na_values = -999.99)
    df["rolling_mean"] = df['value'].rolling(11, center=True).mean()
    return df 

filename = "co2_mlo_surface-insitu_1_ccgg_MonthlyData.txt"
df_noaa = read_noaa_data(filename)
df_noaa.head(15)

	site_code	year	month	day	datetime	time_decimal	midpoint_time	value	value_std_dev	nvalue	latitude	longitude	altitude	elevation	intake_height	qcflag	rolling_mean
0	MLO	1973	8	1	1973-08-01T00:00:00Z	1973.580822	114325200	NaN	-99.99	0	19.536	-155.576	3397.0	3397.0	0.0	*..	NaN
1	MLO	1973	9	1	1973-09-01T00:00:00Z	1973.665753	117003600	NaN	-99.99	0	19.536	-155.576	3397.0	3397.0	0.0	*..	NaN
2	MLO	1973	10	1	1973-10-01T00:00:00Z	1973.747945	119595600	NaN	-99.99	0	19.536	-155.576	3397.0	3397.0	0.0	*..	NaN
3	MLO	1973	11	1	1973-11-01T00:00:00Z	1973.832877	122274000	NaN	-99.99	0	19.536	-155.576	3397.0	3397.0	0.0	*..	NaN
4	MLO	1973	12	1	1973-12-01T00:00:00Z	1973.915068	124866000	NaN	-99.99	0	19.536	-155.576	3397.0	3397.0	0.0	*..	NaN
5	MLO	1974	1	1	1974-01-01T00:00:00Z	1974.000000	127544400	NaN	-99.99	0	19.536	-155.576	3387.1	3397.0	-9.9	*..	NaN
6	MLO	1974	2	1	1974-02-01T00:00:00Z	1974.084932	130222800	NaN	-99.99	0	19.536	-155.576	3387.1	3397.0	-9.9	*..	NaN
7	MLO	1974	3	1	1974-03-01T00:00:00Z	1974.161644	132642000	NaN	-99.99	0	19.536	-155.576	3387.1	3397.0	-9.9	*..	NaN
8	MLO	1974	4	1	1974-04-01T00:00:00Z	1974.246575	135320400	NaN	-99.99	0	19.536	-155.576	3387.1	3397.0	-9.9	*..	NaN
9	MLO	1974	5	1	1974-05-01T00:00:00Z	1974.328767	137912400	333.16	0.36	13	19.536	-155.576	3422.0	3397.0	25.0	...	NaN
10	MLO	1974	6	1	1974-06-01T00:00:00Z	1974.413699	140590800	332.17	0.41	25	19.536	-155.576	3422.0	3397.0	25.0	...	NaN
11	MLO	1974	7	1	1974-07-01T00:00:00Z	1974.495890	143182800	331.11	0.49	24	19.536	-155.576	3422.0	3397.0	25.0	...	NaN
12	MLO	1974	8	1	1974-08-01T00:00:00Z	1974.580822	145861200	329.11	0.64	26	19.536	-155.576	3422.0	3397.0	25.0	...	NaN
13	MLO	1974	9	1	1974-09-01T00:00:00Z	1974.665753	148539600	327.30	0.64	22	19.536	-155.576	3422.0	3397.0	25.0	...	NaN
14	MLO	1974	10	1	1974-10-01T00:00:00Z	1974.747945	151131600	327.30	0.27	24	19.536	-155.576	3422.0	3397.0	25.0	...	330.195455

df_noaa.describe()

	year	month	day	hour	minute	second	time_decimal	midpoint_time	value	value_std_dev	nvalue	latitude	longitude	altitude	elevation	intake_height	rolling_mean
count	593.000000	593.000000	593.0	593.0	593.0	593.0	593.000000	5.930000e+02	582.000000	593.00000	593.000000	5.930000e+02	5.930000e+02	593.000000	593.0	593.000000	562.000000
mean	1997.789207	6.529511	1.0	0.0	0.0	0.0	1998.248087	8.927452e+08	369.691718	-1.23145	25.160202	1.953600e+01	-1.555760e+02	3430.063406	3397.0	33.063406	370.289376
std	14.278755	3.457692	0.0	0.0	0.0	0.0	14.277380	4.505603e+08	25.716157	13.59032	5.347673	5.333570e-14	6.826969e-13	11.486558	0.0	11.486558	24.883814
min	1973.000000	1.000000	1.0	0.0	0.0	0.0	1973.580822	1.143252e+08	327.300000	-99.99000	0.000000	1.953600e+01	-1.555760e+02	3387.100000	3397.0	-9.900000	330.191818
25%	1985.000000	4.000000	1.0	0.0	0.0	0.0	1985.915068	5.035572e+08	348.110000	0.47000	24.000000	1.953600e+01	-1.555760e+02	3422.000000	3397.0	25.000000	349.289091
50%	1998.000000	7.000000	1.0	0.0	0.0	0.0	1998.246575	8.927028e+08	366.815000	0.60000	26.000000	1.953600e+01	-1.555760e+02	3437.000000	3397.0	40.000000	368.157273
75%	2010.000000	10.000000	1.0	0.0	0.0	0.0	2010.580822	1.281935e+09	390.397500	0.75000	28.000000	1.953600e+01	-1.555760e+02	3437.000000	3397.0	40.000000	390.379773
max	2022.000000	12.000000	1.0	0.0	0.0	0.0	2022.915068	1.671167e+09	420.970000	1.40000	31.000000	1.953600e+01	-1.555760e+02	3437.000000	3397.0	40.000000	418.484545

With such datasets, we can observe CO2 records from 1973 up until now for all four observatories.

observatories = {"brw": "Barrow", "mlo": "Mauna Loa", "smo": "American Samoa", "spo": "South Pole"}
fig, axs = plt.subplots(4, 2)
axes = [[0, 0], [0, 1], [1, 0], [1, 1], [2, 0], [2, 1], [3, 0], [3, 1]]
fig.suptitle("Monthly atmospheric CO2 record")

cpt = 0
for obs_name in observatories.keys() :
    # Importing the data
    filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt"
    df_noaa = read_noaa_data(filename)
    
    # First plot, since the beginning
    axs[axes[cpt][0], axes[cpt][1]].plot(df_noaa["time_decimal"], 
                df_noaa["value"], '-o',
                color="red", markersize = 1)
    axs[axes[cpt][0], axes[cpt][1]].plot(df_noaa["time_decimal"], 
            df_noaa["rolling_mean"],
            color="black"
            )    
    # Second plot, over the last five years
    start_year = df_noaa["year"][df_noaa.year.argmax()] - 5
    axs[axes[cpt+1][0], axes[cpt+1][1]].plot(df_noaa[df_noaa["year"] >= start_year]["time_decimal"], 
                                     df_noaa[df_noaa["year"] >= start_year]["value"], '-o',
                                     color="red", markersize = 3, label = "Real value")
    axs[axes[cpt+1][0], axes[cpt+1][1]].plot(df_noaa[df_noaa["year"] >= start_year]["time_decimal"], 
                                     df_noaa[df_noaa["year"] >= start_year]["rolling_mean"],
                                     color="black")
    
    # Displays the name of the observatory for which the records are shown
    obs_name = observatories[df_noaa["site_code"][0].lower()]
    axs[axes[cpt][0], axes[cpt][1]].set(ylabel=obs_name)
    axs[axes[cpt][0], axes[cpt][1]].yaxis.get_label().set_fontsize(8)
    cpt+=2

# Hide x labels and tick labels for top plots and y ticks for right plots.
for ax in axs.flat:
    ax.label_outer()

fig.supxlabel('Year')
fig.supylabel('CO2 (ppm)')
fig.tight_layout()
plt.show()

No description has been provided for this image

Each line represents an observatory, with its name written to the left. Columns show the full period of recordings on the left and the last 5 years on the right. The real value is displayed in red, whereas the rolling mean calculated earlier is shown in black.

If we zoom on the last five years for each observatory, we can see why the rolling average was necessary.

colors = ["#9d0208", "#dc2f02", "#f48c06", "#ffba08"]
cpt=0
fig, (ax1, ax2) = plt.subplots(1, 2, layout='constrained', sharey=True, sharex=True)
for obs_name in observatories.keys() :
    filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt"
    df_noaa = read_noaa_data(filename)
    obs_name = observatories[df_noaa["site_code"][0].lower()]
    start_year = df_noaa["year"][df_noaa.year.argmax()] - 5
    
    ax1.plot(df_noaa[df_noaa["year"] >= start_year]["time_decimal"], 
                df_noaa[df_noaa["year"] >= start_year]["value"], '-o',
                markersize=4, color = colors[cpt], label = obs_name)
    
    ax1.set_title("Monthly CO2 record")
    ax1.legend(loc="upper left")
    
    ax2.plot(df_noaa[df_noaa["year"] >= start_year]["time_decimal"], 
            df_noaa[df_noaa["year"] >= start_year]["rolling_mean"],
            color = colors[cpt], label = obs_name)
    
    ax2.set_title("Averaged CO2 record")
    ax2.legend(loc="upper left")
    
    cpt+=1
fig.supxlabel('Year')
fig.supylabel('CO2 (ppm)')
plt.show()

If we focus on the raw value, they are all very different depending on the observatory. However if we focus on the averaged value, we can assume they all increase in the same way. To make sure of this, we will calculate the slope of a linear least-squares regression fitted to each observatory, and compare the values obtained.

cpt=0
for obs_name in observatories.keys() :
    filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt"
    df_noaa = read_noaa_data(filename)
    obs_name = observatories[df_noaa["site_code"][0].lower()]
    df_noaa = df_noaa.dropna()
    res = stats.linregress(df_noaa["time_decimal"], df_noaa["rolling_mean"])
    plt.plot(df_noaa["time_decimal"], res.intercept + res.slope*df_noaa["time_decimal"], color=colors[cpt], label=f"{obs_name}, a: {res.slope:.4f}")
    plt.legend(loc="upper left")
    cpt+=1

We can conclude that although the CO2 recorded seems vastly different depending on the observatory, it rises in the same way no matter where it's measured.

import numpy as np
def exp_model(x, a, b):
    y = np.exp(a) * (np.exp(b) ** x)
    return y

Exponential Regression in Python method found here.

cpt=0
for obs_name in observatories.keys() :
    filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt"
    df_noaa = read_noaa_data(filename)
    obs_name = observatories[df_noaa["site_code"][0].lower()]
    df_noaa = df_noaa.dropna()
    fit = np.polyfit(df_noaa["time_decimal"], np.log(df_noaa["rolling_mean"]), 1)
    plt.plot(df_noaa["time_decimal"], exp_model(df_noaa["time_decimal"], fit[1], fit[0]), color=colors[cpt], label=f"{obs_name}, a: {fit[1]:.4f}, b: {fit[0]:.4f}")
    plt.legend(loc="upper left")
    cpt+=1

Paleoclimatology¶

From the year 1000 to 2004¶

The dataset is also stored in a text format, but uses tabulations as the separator and contains 23 columns for 1005 lines. It contains the year of recording, from 1000 to 2004, and three mains values : the record at Law Dome (LD), the record at Dronning Maud Land (DML) and the average record over thoses two sites. Each of theses values is smoothed with splines ranging from 50 to 200 years with a 25 year step. We will only take into account, the years and the value for both sites with a 50 year splines, where data is missing the least.

filename = "smoothedco2.txt"
if not Path(filename).is_file():
    urlretrieve(
        "https://www.ncei.noaa.gov/pub/data/paleo/contributions_by_author/frank2010/smoothedco2.txt",
        filename,
    )

df_indus = pd.read_csv(filename, sep='\t', header=0)
df_indus.head()

	Year	ALL_50_full	LD_050	DML_050	ALL_050	LD_075	DML_075	ALL_075	LD_100	DML_100	...	ALL_125	LD_150	DML_150	ALL_150	LD_175	DML_175	ALL_175	LD_200	DML_200	ALL_200
0	1000	278.66	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	1001	278.68	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	1002	278.69	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	1003	278.71	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	1004	278.72	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 23 columns

df_indus.describe()

	Year	ALL_50_full	LD_050	DML_050	ALL_050	LD_075	DML_075	ALL_075	LD_100	DML_100	...	ALL_125	LD_150	DML_150	ALL_150	LD_175	DML_175	ALL_175	LD_200	DML_200	ALL_200
count	1005.000000	1005.000000	751.000000	751.000000	751.000000	751.000000	751.000000	751.000000	751.000000	751.000000	...	751.000000	751.000000	751.000000	751.000000	751.000000	751.000000	751.000000	751.000000	751.000000	751.000000
mean	1502.000000	284.887801	280.554288	280.167523	280.053888	280.537457	280.164940	280.055286	280.493475	280.158362	...	280.028415	280.373249	280.166059	280.006871	280.302783	280.179867	279.975979	280.228282	280.197137	279.925846
std	290.262812	14.310892	2.657925	1.407282	1.844578	2.576056	1.387695	1.800978	2.486121	1.375995	...	1.744265	2.326101	1.328713	1.714550	2.290978	1.287739	1.689981	2.286184	1.239137	1.692015
min	1000.000000	276.450000	274.610000	277.480000	276.450000	275.470000	277.640000	276.890000	275.760000	277.600000	...	276.700000	275.930000	277.670000	276.690000	275.920000	277.820000	276.710000	275.890000	277.990000	276.710000
25%	1251.000000	279.600000	278.440000	279.085000	278.560000	278.580000	279.105000	278.635000	278.670000	279.170000	...	278.755000	278.605000	279.200000	278.755000	278.510000	279.190000	278.685000	278.375000	279.175000	278.540000
50%	1502.000000	280.980000	281.460000	280.460000	280.620000	281.380000	280.310000	280.580000	281.230000	280.260000	...	280.510000	281.020000	280.370000	280.480000	280.950000	280.390000	280.450000	280.940000	280.410000	280.460000
75%	1753.000000	282.100000	282.585000	281.315000	281.360000	282.495000	281.180000	281.330000	282.170000	281.235000	...	281.315000	282.175000	281.285000	281.305000	282.160000	281.295000	281.295000	282.145000	281.310000	281.280000
max	2004.000000	372.930000	284.000000	282.410000	282.630000	283.850000	282.150000	282.450000	283.690000	282.040000	...	282.230000	283.280000	281.950000	282.130000	283.060000	281.890000	282.030000	282.860000	281.830000	281.930000

8 rows × 23 columns

We can display the CO2 estimated from 1000 to 2004.

plt.plot(df_indus["Year"], 
         df_indus["ALL_50_full"],
         color="lightblue", label = "Ice core"
         )


plt.title("Global atmospheric CO2 since 1000")
plt.xlabel("Year")
plt.ylabel("CO2 (ppm)")
plt.show()

Theses values only going up to 2004, but we'd like to track the current CO2 records. Therefore, we will manually add the averaged values recorded at Mauna Loa (calculated same as before) since 2004.

Values range from 1000 to 2004, but we'd like it to go up to now. We'll use the values observed at Mauna Loa.

filename = "co2_mlo_surface-insitu_1_ccgg_MonthlyData.txt"
df_mlo = read_noaa_data(filename)
df_mlo = df_mlo[["year", "rolling_mean"]]
df_mlo.head()

	year	rolling_mean
0	1973	NaN
1	1973	NaN
2	1973	NaN
3	1973	NaN
4	1973	NaN

df_mlo = df_mlo.groupby(["year"])[["rolling_mean"]].mean()
df_mlo = df_mlo.rename_axis("year").reset_index()
df_mlo = df_mlo.rename(columns={"year": "Year", "rolling_mean": "ALL_50_full"})
df_mlo.head()

	Year	ALL_50_full
0	1973	NaN
1	1974	330.246970
2	1975	331.001364
3	1976	332.148831
4	1977	333.823409

df_indus_new = pd.concat([df_indus, df_mlo[df_mlo["Year"] > 2004]], axis=0)
df_indus_new.tail()

	Year	ALL_50_full	LD_050	DML_050	ALL_050	LD_075	DML_075	ALL_075	LD_100	DML_100	...	ALL_125	LD_150	DML_150	ALL_150	LD_175	DML_175	ALL_175	LD_200	DML_200	ALL_200
45	2018	408.782197	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
46	2019	411.648636	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
47	2020	414.177424	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
48	2021	416.408636	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
49	2022	418.103485	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 23 columns

We can now see atmospheric CO2 records from 1000 to now. This way, we can also focus on atmospheric CO2 records after the pre-industrial era, around 1750.

fig, (ax1, ax2) = plt.subplots(1, 2, sharey=True)

ax1.plot(df_indus["Year"], 
         df_indus["ALL_50_full"],
         color="lightblue")
ax1.set_title("Since 1000")

ax2.plot(df_indus[df_indus["Year"] >= 1750]["Year"], 
         df_indus[df_indus["Year"] >= 1750]["ALL_50_full"],
         color="lightblue")
ax2.set_title("Since 1750")

fig.suptitle("Global atmospheric CO2")
fig.supxlabel('Year')
fig.supylabel('CO2 (ppm)')
plt.show()

800,000 years before present¶

The dataset is once again stored in a text format, also uses tabulations as the separator and contains 2 columns for 1096 lines. It contains the year before present, meaning before 1950, and the CO2 records. This means that for instance, the first year being 137 BP is actually the year 1813. This has to changed in order for the plot to be displayed correctly.

filename = "edc3-composite-co2-2008-noaa.txt"
if not Path(filename).is_file():
    urlretrieve(
        "https://www.ncei.noaa.gov/pub/data/paleo/icecore/antarctica/epica_domec/edc3-composite-co2-2008-noaa.txt",
        filename,
    )

df_icecore = pd.read_csv(filename, comment='#', sep='\t', header=0)
df_icecore.head()

	gas_ageBP	CO2
0	137	280.4
1	268	274.9
2	279	277.9
3	395	279.1
4	404	281.9

df_icecore.describe()

	gas_ageBP	CO2
count	1096.000000	1096.000000
mean	390905.979015	230.835675
std	262092.947239	27.573616
min	137.000000	171.600000
25%	137133.500000	207.500000
50%	423206.500000	231.450000
75%	627408.000000	251.525000
max	798512.000000	298.600000

df_icecore["gas_ageBP"] = 1950 - df_icecore["gas_ageBP"]
df_icecore.head()

	gas_ageBP	CO2
0	1813	280.4
1	1682	274.9
2	1671	277.9
3	1555	279.1
4	1546	281.9

Like before, we'd like to look at the values up until now. We will manually add the previous dataframe (from 1000 to now) to our current one.

df_current = df_indus_new.rename(columns={"Year": "gas_ageBP", "ALL_50_full": "CO2"})
df_full = pd.concat([df_icecore, df_current.iloc[:, :2]], axis=0)
df_full.tail()

	gas_ageBP	CO2
45	2018	408.782197
46	2019	411.648636
47	2020	414.177424
48	2021	416.408636
49	2022	418.103485

We also want to order the years chronologically. This will be sorting by ordering the dataset based on the year.

df_full = df_full.sort_values(by=["gas_ageBP"])
df_full.head()

	gas_ageBP	CO2
1095	-796562	191.0
1094	-795149	188.4
1093	-794517	189.3
1092	-793252	195.2
1091	-792658	199.4

plt.plot(df_full["gas_ageBP"] / 1000, 
            df_full["CO2"],
            color="brown"
            )
plt.xlim([-800, 5])
plt.title("CO2 record for the last 800 000 year")
plt.xlabel("Year (10**3)")
plt.ylabel("CO2 (ppm)")
plt.axhline(300, linestyle='--', color = "grey", label ="Highest historical CO2 level")
plt.axhline(float(df_full[df_full["gas_ageBP"] == 1950]["CO2"]) , linestyle='--', color = "red", label ="1950")
plt.axhline(float(df_full[df_full["gas_ageBP"] == df_full["gas_ageBP"].max()]["CO2"]), linestyle='--', color = "orange", label ="current")
plt.legend(loc='upper center')
plt.show()

Zooming on CO2 records after 1750, we can see how the highest historical CO2 record was surpassed soon after 1900. After 1950, it rose even moreso than before, and keeps on rising.

plt.plot(df_full[df_full["gas_ageBP"] >= 1750]["gas_ageBP"], 
         df_full[df_full["gas_ageBP"] >= 1750]["CO2"],
         color="brown")
plt.title("CO2 record since 1750")
plt.xlabel("Year")
plt.ylabel("CO2 (ppm)")
plt.axhline(300, linestyle='--', color = "grey", label ="Highest historical CO2 level")
plt.axhline(float(df_full[df_full["gas_ageBP"] == 1950]["CO2"]) , linestyle='--', color = "red", label ="1950")
plt.axhline(float(df_full[df_full["gas_ageBP"] == df_full["gas_ageBP"].max()]["CO2"]), linestyle='--', color = "orange", label ="current")
plt.legend(loc='upper center')
plt.show()