Data science of climate change¶

Data source¶

National Oceanic and Atmospheric Administration (NOAA)¶

NOAA is an agency of the U.S. department of commerce that “enriches life through science”. They work to keep the public informed of the changing environment around them. In 1972, they've created The Global Monitoring Laboratory (GML), as part of the NOAA Environmental Research Laboratories. It's mission is focused on geophysical monitoring for climatic change.

GML conducts research that addresses three major challenges: greenhouse gas and carbon cycle feedbacks, changes in clouds, aerosols, and surface radiation, and recovery of stratospheric ozone. One of four programs created by GML is the Carbon Cycle Greenhouse Gases (CCGG), in which the spatial and temporal distributions of greenhouse gases are measured and modeled.

The CCGG research area operates the Global Greenhouse Gas Reference Network, measuring the atmospheric distribution and trends of the three main long-term drivers of climate change, carbon dioxide (CO2), methane (CH4), and nitrous oxide (N2O), as well as carbon monoxide (CO) which is an important indicator of air pollution.

CO2 measurements in the 70s¶

NOAA Carbon Dioxide (CO2) measurements first began in 1967 at the Niwot Ridge high altitude research station, Colorado. Since then, the CCGG cooperative air sampling network measures CO2 from more than 50 sites world wide. In-Situ continuous measurements of CO2 at the NOAA GML Atmospheric Baseline Observatories began in 1973. Additional in-situ measurements from tall towers and light aircraft in the USA began in the 1990's, to support studies on continental sources and sinks of greenhouse gases. Vertical profile data from the AirCore Sampling System began in 2012 to monitor greenhouse gases at high altitudes and to help evaluate remote sensing measurements.

Flask (1968-2022)¶

GML measures greenhouse gases through a cooperative air sampling network effort which began in 1967. Air samples are collected approximately weekly with flasks from a globally distributed network of sites, ranging from being collected by volunteers to being collected by aircrafts.
Measurement data are used to identify long-term trends, seasonal variability, and spatial distribution of carbon cycle gases.

This data can be found in GML's CO2 database.

In situ (1973-2022)¶

Greenhouse gases are also measured at four Atmospheric Baseline Observatories and multiple tall towers in the United States. It first began measurements of CO2 at the observatories in 1973. Continuous in-situ measurements of these gases provides great detail in their long term trends, seasonal and short-term variations, and diurnal cycles.

This data can also be found in GML's CO2 database.

CO2 estimates before real measurements, paleoclimatology¶

The most direct method of investigating past variations of the atmospheric CO2 concentration before 1958, when continuous direct atmospheric CO2 measurements started, is the analysis of air extracted from suitable ice cores (Siegenthaler et al., 2005)

The measurement of the gas composition is direct: trapped in deep ice cores are tiny bubbles of ancient air, which we can extract and analyze using mass spectrometers (https://www.scientificamerican.com/article/how-are-past-temperatures/).

From 1000 to 2004¶

In 2010, pre-industrial CO2 data was estimated using two ice cores. Measurements for Law Dome (LD) and Dronning Maud Land (DML) were sufficient to yield smoothed, with 50-year splines, estimates for CO2 evolution between 1000 and 2004 (Frank et al., 2010).

This data can be found in NOAA's Paleoclimatology database, as study 10437.

800,000 years before present¶

In 2008, two years prior, the Antarctic Vostok and EPICA Dome C ice cores had provided a composite record of atmospheric carbon dioxide levels over the past 800,000 years (Lüthi et al., 2008).

This data can also be found in NOAA's Paleoclimatology database, as study 6091.

Unit of measurement¶

When measuring gases the term concentration is used to describe the amount of gas by volume in the air. The two most common units of measurement are parts-per-million, and percent concentration. All of our data uses parts-per-million as their unit of measurement.
Parts-per-million (abbreviated ppm) is the ratio of one gas to another. For example, 1,000 ppm of CO2 means that if you could count a million gas molecules, 1,000 of them would be of carbon dioxide and 999,000 molecules would be some other gases (https://www.co2meter.com/fr-fr/blogs/news/15164297-co2-gas-concentration-defined).

Data analysis¶

1
2
3
4
5
6
import matplotlib.pyplot as plt
import pandas as pd
from pathlib import Path
from urllib.request import urlretrieve
import urllib.parse
from scipy import stats

In situ (1973-2022)¶

The dataset is stored in a text format, uses spaces as the separator and contains 20 columns for 593 lines. It contains a site code, which refers to the observatory where the data was recorded, the date, from the year down to the second, the CO2 record and other attributes of the measurements. Therefore, we have four different dataset, each specific to an observatory :

Barrow Mauna Loa American Samoa South Pole
BRW MLO SMO SPO
1
2
3
4
5
6
observatories = {"brw": "Barrow", "mlo": "Mauna Loa", "smo": "American Samoa", "spo": "South Pole"}
for obs_name in observatories.keys():
    filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt"
    url = f"https://gml.noaa.gov/aftp/data/trace_gases/co2/in-situ/surface/txt/{filename}"
    if not Path(filename).is_file():
        urlretrieve(url, filename)
  • By default, missing values are displayed as -999,99.
# value:_FillValue : -999.99

To prevent these values from dragging the graph down, they will be replaced by Na when we import the datasets.

  • In order to smooth the CO2 records, we also decided to add a rolling average of eleven adjacent seasonal cycles, centered on the month to be corrected. This number was chosen to center the average value over the year, erasing the seasonal variability. At any point in time, the CO2 recorded is adjusted by the values of the nearest four-season cycle.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
def read_noaa_data(filename):
    """Reads a dataframe from a NOAA file and adds a rolling mean to the value.

    Parameters
    ----------
    filename : string
        A dataframe in a text file from NOAA.

    Returns
    -------
    dataframe :
        The dataframe.
    """
    df = pd.read_csv(filename, comment='#', sep=' ', header=0, na_values = -999.99)
    df["rolling_mean"] = df['value'].rolling(11, center=True).mean()
    return df 

filename = "co2_mlo_surface-insitu_1_ccgg_MonthlyData.txt"
df_noaa = read_noaa_data(filename)
df_noaa.head(15)
site_code year month day hour minute second datetime time_decimal midpoint_time value value_std_dev nvalue latitude longitude altitude elevation intake_height qcflag rolling_mean
0 MLO 1973 8 1 0 0 0 1973-08-01T00:00:00Z 1973.580822 114325200 NaN -99.99 0 19.536 -155.576 3397.0 3397.0 0.0 *.. NaN
1 MLO 1973 9 1 0 0 0 1973-09-01T00:00:00Z 1973.665753 117003600 NaN -99.99 0 19.536 -155.576 3397.0 3397.0 0.0 *.. NaN
2 MLO 1973 10 1 0 0 0 1973-10-01T00:00:00Z 1973.747945 119595600 NaN -99.99 0 19.536 -155.576 3397.0 3397.0 0.0 *.. NaN
3 MLO 1973 11 1 0 0 0 1973-11-01T00:00:00Z 1973.832877 122274000 NaN -99.99 0 19.536 -155.576 3397.0 3397.0 0.0 *.. NaN
4 MLO 1973 12 1 0 0 0 1973-12-01T00:00:00Z 1973.915068 124866000 NaN -99.99 0 19.536 -155.576 3397.0 3397.0 0.0 *.. NaN
5 MLO 1974 1 1 0 0 0 1974-01-01T00:00:00Z 1974.000000 127544400 NaN -99.99 0 19.536 -155.576 3387.1 3397.0 -9.9 *.. NaN
6 MLO 1974 2 1 0 0 0 1974-02-01T00:00:00Z 1974.084932 130222800 NaN -99.99 0 19.536 -155.576 3387.1 3397.0 -9.9 *.. NaN
7 MLO 1974 3 1 0 0 0 1974-03-01T00:00:00Z 1974.161644 132642000 NaN -99.99 0 19.536 -155.576 3387.1 3397.0 -9.9 *.. NaN
8 MLO 1974 4 1 0 0 0 1974-04-01T00:00:00Z 1974.246575 135320400 NaN -99.99 0 19.536 -155.576 3387.1 3397.0 -9.9 *.. NaN
9 MLO 1974 5 1 0 0 0 1974-05-01T00:00:00Z 1974.328767 137912400 333.16 0.36 13 19.536 -155.576 3422.0 3397.0 25.0 ... NaN
10 MLO 1974 6 1 0 0 0 1974-06-01T00:00:00Z 1974.413699 140590800 332.17 0.41 25 19.536 -155.576 3422.0 3397.0 25.0 ... NaN
11 MLO 1974 7 1 0 0 0 1974-07-01T00:00:00Z 1974.495890 143182800 331.11 0.49 24 19.536 -155.576 3422.0 3397.0 25.0 ... NaN
12 MLO 1974 8 1 0 0 0 1974-08-01T00:00:00Z 1974.580822 145861200 329.11 0.64 26 19.536 -155.576 3422.0 3397.0 25.0 ... NaN
13 MLO 1974 9 1 0 0 0 1974-09-01T00:00:00Z 1974.665753 148539600 327.30 0.64 22 19.536 -155.576 3422.0 3397.0 25.0 ... NaN
14 MLO 1974 10 1 0 0 0 1974-10-01T00:00:00Z 1974.747945 151131600 327.30 0.27 24 19.536 -155.576 3422.0 3397.0 25.0 ... 330.195455
1
df_noaa.describe()
year month day hour minute second time_decimal midpoint_time value value_std_dev nvalue latitude longitude altitude elevation intake_height rolling_mean
count 593.000000 593.000000 593.0 593.0 593.0 593.0 593.000000 5.930000e+02 582.000000 593.00000 593.000000 5.930000e+02 5.930000e+02 593.000000 593.0 593.000000 562.000000
mean 1997.789207 6.529511 1.0 0.0 0.0 0.0 1998.248087 8.927452e+08 369.691718 -1.23145 25.160202 1.953600e+01 -1.555760e+02 3430.063406 3397.0 33.063406 370.289376
std 14.278755 3.457692 0.0 0.0 0.0 0.0 14.277380 4.505603e+08 25.716157 13.59032 5.347673 5.333570e-14 6.826969e-13 11.486558 0.0 11.486558 24.883814
min 1973.000000 1.000000 1.0 0.0 0.0 0.0 1973.580822 1.143252e+08 327.300000 -99.99000 0.000000 1.953600e+01 -1.555760e+02 3387.100000 3397.0 -9.900000 330.191818
25% 1985.000000 4.000000 1.0 0.0 0.0 0.0 1985.915068 5.035572e+08 348.110000 0.47000 24.000000 1.953600e+01 -1.555760e+02 3422.000000 3397.0 25.000000 349.289091
50% 1998.000000 7.000000 1.0 0.0 0.0 0.0 1998.246575 8.927028e+08 366.815000 0.60000 26.000000 1.953600e+01 -1.555760e+02 3437.000000 3397.0 40.000000 368.157273
75% 2010.000000 10.000000 1.0 0.0 0.0 0.0 2010.580822 1.281935e+09 390.397500 0.75000 28.000000 1.953600e+01 -1.555760e+02 3437.000000 3397.0 40.000000 390.379773
max 2022.000000 12.000000 1.0 0.0 0.0 0.0 2022.915068 1.671167e+09 420.970000 1.40000 31.000000 1.953600e+01 -1.555760e+02 3437.000000 3397.0 40.000000 418.484545

With such datasets, we can observe CO2 records from 1973 up until now for all four observatories.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
observatories = {"brw": "Barrow", "mlo": "Mauna Loa", "smo": "American Samoa", "spo": "South Pole"}
fig, axs = plt.subplots(4, 2)
axes = [[0, 0], [0, 1], [1, 0], [1, 1], [2, 0], [2, 1], [3, 0], [3, 1]]
fig.suptitle("Monthly atmospheric CO2 record")

cpt = 0
for obs_name in observatories.keys() :
    # Importing the data
    filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt"
    df_noaa = read_noaa_data(filename)
    
    # First plot, since the beginning
    axs[axes[cpt][0], axes[cpt][1]].plot(df_noaa["time_decimal"], 
                df_noaa["value"], '-o',
                color="red", markersize = 1)
    axs[axes[cpt][0], axes[cpt][1]].plot(df_noaa["time_decimal"], 
            df_noaa["rolling_mean"],
            color="black"
            )    
    # Second plot, over the last five years
    start_year = df_noaa["year"][df_noaa.year.argmax()] - 5
    axs[axes[cpt+1][0], axes[cpt+1][1]].plot(df_noaa[df_noaa["year"] >= start_year]["time_decimal"], 
                                     df_noaa[df_noaa["year"] >= start_year]["value"], '-o',
                                     color="red", markersize = 3, label = "Real value")
    axs[axes[cpt+1][0], axes[cpt+1][1]].plot(df_noaa[df_noaa["year"] >= start_year]["time_decimal"], 
                                     df_noaa[df_noaa["year"] >= start_year]["rolling_mean"],
                                     color="black")
    
    # Displays the name of the observatory for which the records are shown
    obs_name = observatories[df_noaa["site_code"][0].lower()]
    axs[axes[cpt][0], axes[cpt][1]].set(ylabel=obs_name)
    axs[axes[cpt][0], axes[cpt][1]].yaxis.get_label().set_fontsize(8)
    cpt+=2

# Hide x labels and tick labels for top plots and y ticks for right plots.
for ax in axs.flat:
    ax.label_outer()

fig.supxlabel('Year')
fig.supylabel('CO2 (ppm)')
fig.tight_layout()
plt.show()
No description has been provided for this image

Each line represents an observatory, with its name written to the left. Columns show the full period of recordings on the left and the last 5 years on the right. The real value is displayed in red, whereas the rolling mean calculated earlier is shown in black.

If we zoom on the last five years for each observatory, we can see why the rolling average was necessary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
colors = ["#9d0208", "#dc2f02", "#f48c06", "#ffba08"]
cpt=0
fig, (ax1, ax2) = plt.subplots(1, 2, layout='constrained', sharey=True, sharex=True)
for obs_name in observatories.keys() :
    filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt"
    df_noaa = read_noaa_data(filename)
    obs_name = observatories[df_noaa["site_code"][0].lower()]
    start_year = df_noaa["year"][df_noaa.year.argmax()] - 5
    
    ax1.plot(df_noaa[df_noaa["year"] >= start_year]["time_decimal"], 
                df_noaa[df_noaa["year"] >= start_year]["value"], '-o',
                markersize=4, color = colors[cpt], label = obs_name)
    
    ax1.set_title("Monthly CO2 record")
    ax1.legend(loc="upper left")
    
    ax2.plot(df_noaa[df_noaa["year"] >= start_year]["time_decimal"], 
            df_noaa[df_noaa["year"] >= start_year]["rolling_mean"],
            color = colors[cpt], label = obs_name)
    
    ax2.set_title("Averaged CO2 record")
    ax2.legend(loc="upper left")
    
    cpt+=1
fig.supxlabel('Year')
fig.supylabel('CO2 (ppm)')
plt.show()
No description has been provided for this image

If we focus on the raw value, they are all very different depending on the observatory. However if we focus on the averaged value, we can assume they all increase in the same way. To make sure of this, we will calculate the slope of a linear least-squares regression fitted to each observatory, and compare the values obtained.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
cpt=0
for obs_name in observatories.keys() :
    filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt"
    df_noaa = read_noaa_data(filename)
    obs_name = observatories[df_noaa["site_code"][0].lower()]
    df_noaa = df_noaa.dropna()
    res = stats.linregress(df_noaa["time_decimal"], df_noaa["rolling_mean"])
    plt.plot(df_noaa["time_decimal"], res.intercept + res.slope*df_noaa["time_decimal"], color=colors[cpt], label=f"{obs_name}, a: {res.slope:.4f}")
    plt.legend(loc="upper left")
    cpt+=1
No description has been provided for this image

We can conclude that although the CO2 recorded seems vastly different depending on the observatory, it rises in the same way no matter where it's measured.

1
2
3
4
import numpy as np
def exp_model(x, a, b):
    y = np.exp(a) * (np.exp(b) ** x)
    return y

Exponential Regression in Python method found here.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
cpt=0
for obs_name in observatories.keys() :
    filename = f"co2_{obs_name}_surface-insitu_1_ccgg_MonthlyData.txt"
    df_noaa = read_noaa_data(filename)
    obs_name = observatories[df_noaa["site_code"][0].lower()]
    df_noaa = df_noaa.dropna()
    fit = np.polyfit(df_noaa["time_decimal"], np.log(df_noaa["rolling_mean"]), 1)
    plt.plot(df_noaa["time_decimal"], exp_model(df_noaa["time_decimal"], fit[1], fit[0]), color=colors[cpt], label=f"{obs_name}, a: {fit[1]:.4f}, b: {fit[0]:.4f}")
    plt.legend(loc="upper left")
    cpt+=1
No description has been provided for this image

Paleoclimatology¶

From the year 1000 to 2004¶

The dataset is also stored in a text format, but uses tabulations as the separator and contains 23 columns for 1005 lines. It contains the year of recording, from 1000 to 2004, and three mains values : the record at Law Dome (LD), the record at Dronning Maud Land (DML) and the average record over thoses two sites. Each of theses values is smoothed with splines ranging from 50 to 200 years with a 25 year step. We will only take into account, the years and the value for both sites with a 50 year splines, where data is missing the least.

1
2
3
4
5
6
7
8
9
filename = "smoothedco2.txt"
if not Path(filename).is_file():
    urlretrieve(
        "https://www.ncei.noaa.gov/pub/data/paleo/contributions_by_author/frank2010/smoothedco2.txt",
        filename,
    )

df_indus = pd.read_csv(filename, sep='\t', header=0)
df_indus.head()
Year ALL_50_full LD_050 DML_050 ALL_050 LD_075 DML_075 ALL_075 LD_100 DML_100 ... ALL_125 LD_150 DML_150 ALL_150 LD_175 DML_175 ALL_175 LD_200 DML_200 ALL_200
0 1000 278.66 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 1001 278.68 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 1002 278.69 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 1003 278.71 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 1004 278.72 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 23 columns

1
df_indus.describe()
Year ALL_50_full LD_050 DML_050 ALL_050 LD_075 DML_075 ALL_075 LD_100 DML_100 ... ALL_125 LD_150 DML_150 ALL_150 LD_175 DML_175 ALL_175 LD_200 DML_200 ALL_200
count 1005.000000 1005.000000 751.000000 751.000000 751.000000 751.000000 751.000000 751.000000 751.000000 751.000000 ... 751.000000 751.000000 751.000000 751.000000 751.000000 751.000000 751.000000 751.000000 751.000000 751.000000
mean 1502.000000 284.887801 280.554288 280.167523 280.053888 280.537457 280.164940 280.055286 280.493475 280.158362 ... 280.028415 280.373249 280.166059 280.006871 280.302783 280.179867 279.975979 280.228282 280.197137 279.925846
std 290.262812 14.310892 2.657925 1.407282 1.844578 2.576056 1.387695 1.800978 2.486121 1.375995 ... 1.744265 2.326101 1.328713 1.714550 2.290978 1.287739 1.689981 2.286184 1.239137 1.692015
min 1000.000000 276.450000 274.610000 277.480000 276.450000 275.470000 277.640000 276.890000 275.760000 277.600000 ... 276.700000 275.930000 277.670000 276.690000 275.920000 277.820000 276.710000 275.890000 277.990000 276.710000
25% 1251.000000 279.600000 278.440000 279.085000 278.560000 278.580000 279.105000 278.635000 278.670000 279.170000 ... 278.755000 278.605000 279.200000 278.755000 278.510000 279.190000 278.685000 278.375000 279.175000 278.540000
50% 1502.000000 280.980000 281.460000 280.460000 280.620000 281.380000 280.310000 280.580000 281.230000 280.260000 ... 280.510000 281.020000 280.370000 280.480000 280.950000 280.390000 280.450000 280.940000 280.410000 280.460000
75% 1753.000000 282.100000 282.585000 281.315000 281.360000 282.495000 281.180000 281.330000 282.170000 281.235000 ... 281.315000 282.175000 281.285000 281.305000 282.160000 281.295000 281.295000 282.145000 281.310000 281.280000
max 2004.000000 372.930000 284.000000 282.410000 282.630000 283.850000 282.150000 282.450000 283.690000 282.040000 ... 282.230000 283.280000 281.950000 282.130000 283.060000 281.890000 282.030000 282.860000 281.830000 281.930000

8 rows × 23 columns

We can display the CO2 estimated from 1000 to 2004.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
plt.plot(df_indus["Year"], 
         df_indus["ALL_50_full"],
         color="lightblue", label = "Ice core"
         )


plt.title("Global atmospheric CO2 since 1000")
plt.xlabel("Year")
plt.ylabel("CO2 (ppm)")
plt.show()
No description has been provided for this image

Theses values only going up to 2004, but we'd like to track the current CO2 records. Therefore, we will manually add the averaged values recorded at Mauna Loa (calculated same as before) since 2004.

Values range from 1000 to 2004, but we'd like it to go up to now. We'll use the values observed at Mauna Loa.

1
2
3
4
filename = "co2_mlo_surface-insitu_1_ccgg_MonthlyData.txt"
df_mlo = read_noaa_data(filename)
df_mlo = df_mlo[["year", "rolling_mean"]]
df_mlo.head()
year rolling_mean
0 1973 NaN
1 1973 NaN
2 1973 NaN
3 1973 NaN
4 1973 NaN
1
2
3
4
df_mlo = df_mlo.groupby(["year"])[["rolling_mean"]].mean()
df_mlo = df_mlo.rename_axis("year").reset_index()
df_mlo = df_mlo.rename(columns={"year": "Year", "rolling_mean": "ALL_50_full"})
df_mlo.head()
Year ALL_50_full
0 1973 NaN
1 1974 330.246970
2 1975 331.001364
3 1976 332.148831
4 1977 333.823409
1
2
df_indus_new = pd.concat([df_indus, df_mlo[df_mlo["Year"] > 2004]], axis=0)
df_indus_new.tail()
Year ALL_50_full LD_050 DML_050 ALL_050 LD_075 DML_075 ALL_075 LD_100 DML_100 ... ALL_125 LD_150 DML_150 ALL_150 LD_175 DML_175 ALL_175 LD_200 DML_200 ALL_200
45 2018 408.782197 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
46 2019 411.648636 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
47 2020 414.177424 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
48 2021 416.408636 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
49 2022 418.103485 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 23 columns

We can now see atmospheric CO2 records from 1000 to now. This way, we can also focus on atmospheric CO2 records after the pre-industrial era, around 1750.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
fig, (ax1, ax2) = plt.subplots(1, 2, sharey=True)

ax1.plot(df_indus["Year"], 
         df_indus["ALL_50_full"],
         color="lightblue")
ax1.set_title("Since 1000")

ax2.plot(df_indus[df_indus["Year"] >= 1750]["Year"], 
         df_indus[df_indus["Year"] >= 1750]["ALL_50_full"],
         color="lightblue")
ax2.set_title("Since 1750")

fig.suptitle("Global atmospheric CO2")
fig.supxlabel('Year')
fig.supylabel('CO2 (ppm)')
plt.show()
No description has been provided for this image

800,000 years before present¶

The dataset is once again stored in a text format, also uses tabulations as the separator and contains 2 columns for 1096 lines. It contains the year before present, meaning before 1950, and the CO2 records. This means that for instance, the first year being 137 BP is actually the year 1813. This has to changed in order for the plot to be displayed correctly.

1
2
3
4
5
6
7
8
9
filename = "edc3-composite-co2-2008-noaa.txt"
if not Path(filename).is_file():
    urlretrieve(
        "https://www.ncei.noaa.gov/pub/data/paleo/icecore/antarctica/epica_domec/edc3-composite-co2-2008-noaa.txt",
        filename,
    )

df_icecore = pd.read_csv(filename, comment='#', sep='\t', header=0)
df_icecore.head()
gas_ageBP CO2
0 137 280.4
1 268 274.9
2 279 277.9
3 395 279.1
4 404 281.9
1
df_icecore.describe()
gas_ageBP CO2
count 1096.000000 1096.000000
mean 390905.979015 230.835675
std 262092.947239 27.573616
min 137.000000 171.600000
25% 137133.500000 207.500000
50% 423206.500000 231.450000
75% 627408.000000 251.525000
max 798512.000000 298.600000
1
2
df_icecore["gas_ageBP"] = 1950 - df_icecore["gas_ageBP"]
df_icecore.head()
gas_ageBP CO2
0 1813 280.4
1 1682 274.9
2 1671 277.9
3 1555 279.1
4 1546 281.9

Like before, we'd like to look at the values up until now. We will manually add the previous dataframe (from 1000 to now) to our current one.

1
2
3
df_current = df_indus_new.rename(columns={"Year": "gas_ageBP", "ALL_50_full": "CO2"})
df_full = pd.concat([df_icecore, df_current.iloc[:, :2]], axis=0)
df_full.tail()
gas_ageBP CO2
45 2018 408.782197
46 2019 411.648636
47 2020 414.177424
48 2021 416.408636
49 2022 418.103485

We also want to order the years chronologically. This will be sorting by ordering the dataset based on the year.

1
2
df_full = df_full.sort_values(by=["gas_ageBP"])
df_full.head()
gas_ageBP CO2
1095 -796562 191.0
1094 -795149 188.4
1093 -794517 189.3
1092 -793252 195.2
1091 -792658 199.4
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
plt.plot(df_full["gas_ageBP"] / 1000, 
            df_full["CO2"],
            color="brown"
            )
plt.xlim([-800, 5])
plt.title("CO2 record for the last 800 000 year")
plt.xlabel("Year (10**3)")
plt.ylabel("CO2 (ppm)")
plt.axhline(300, linestyle='--', color = "grey", label ="Highest historical CO2 level")
plt.axhline(float(df_full[df_full["gas_ageBP"] == 1950]["CO2"]) , linestyle='--', color = "red", label ="1950")
plt.axhline(float(df_full[df_full["gas_ageBP"] == df_full["gas_ageBP"].max()]["CO2"]), linestyle='--', color = "orange", label ="current")
plt.legend(loc='upper center')
plt.show()
No description has been provided for this image

Zooming on CO2 records after 1750, we can see how the highest historical CO2 record was surpassed soon after 1900. After 1950, it rose even moreso than before, and keeps on rising.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
plt.plot(df_full[df_full["gas_ageBP"] >= 1750]["gas_ageBP"], 
         df_full[df_full["gas_ageBP"] >= 1750]["CO2"],
         color="brown")
plt.title("CO2 record since 1750")
plt.xlabel("Year")
plt.ylabel("CO2 (ppm)")
plt.axhline(300, linestyle='--', color = "grey", label ="Highest historical CO2 level")
plt.axhline(float(df_full[df_full["gas_ageBP"] == 1950]["CO2"]) , linestyle='--', color = "red", label ="1950")
plt.axhline(float(df_full[df_full["gas_ageBP"] == df_full["gas_ageBP"].max()]["CO2"]), linestyle='--', color = "orange", label ="current")
plt.legend(loc='upper center')
plt.show()
No description has been provided for this image
1