{ "cells": [ { "cell_type": "markdown", "id": "fe17af6b-6678-4ee6-866c-fd1c53787c15", "metadata": {}, "source": [ "# Access Patterns to Remote Data with *fsspec*\n", "\n", "Accessing remote data with xarray usually means working with cloud-optimized formats like Zarr or COGs, the [CMIP6 tutorial](remote-data.ipynb) shows this pattern in detail. These formats were designed to be efficiently accessed over the internet, however in many cases we might need to access data that is not be available in such formats. \n", "\n", "This notebook will explore how we can levarage xarray's backends to access remote files. For this we will make use of [`fsspec`](https://github.com/fsspec/filesystem_spec), a powerful Python library that abstracts the internal implementation of remote storage systems into a uniform API that can be used by many file-format specific libraries.\n", "\n", "Before starting with remote data, it may be helpful to understand how xarray handles local files and how xarray backends work. Let's consider a scenario where we have a local NetCDF4 file containing gridded data. NetCDF is a common file format used in scientific research for storing array-like data." ] }, { "cell_type": "code", "execution_count": 1, "id": "6c527e9e-cf5f-46e8-bfb4-72301ee51037", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<xarray.Dataset> Size: 8MB\n", "Dimensions: (lat: 89, lon: 180, time: 128)\n", "Coordinates:\n", " * lat (lat) float32 356B 88.0 86.0 84.0 82.0 ... -82.0 -84.0 -86.0 -88.0\n", " * lon (lon) float32 720B 0.0 2.0 4.0 6.0 8.0 ... 352.0 354.0 356.0 358.0\n", " * time (time) datetime64[ns] 1kB 2010-01-01 2010-02-01 ... 2020-08-01\n", "Data variables:\n", " sst (time, lat, lon) float32 8MB ...\n", "Attributes: (12/37)\n", " climatology: Climatology is based on 1971-2000 SST, Xue, Y....\n", " description: In situ data: ICOADS2.5 before 2007 and NCEP i...\n", " keywords_vocabulary: NASA Global Change Master Directory (GCMD) Sci...\n", " keywords: Earth Science > Oceans > Ocean Temperature > S...\n", " instrument: Conventional thermometers\n", " source_comment: SSTs were observed by conventional thermometer...\n", " ... ...\n", " creator_url_original: https://www.ncei.noaa.gov\n", " license: No constraints on data access or use\n", " comment: SSTs were observed by conventional thermometer...\n", " summary: ERSST.v5 is developed based on v4 after revisi...\n", " dataset_title: NOAA Extended Reconstructed SST V5\n", " data_modified: 2020-09-07
def open_dataset() at xarray/backends/api.py:391 \n",
"def guess_engine() at xarray/backends/plugins.py:147 \n",
"def guess_can_open() at xarray/backends/netCDF4_.py:607 \n",
"def is_remote_uri() at xarray/core/utils.py:641 \n",
"def try_read_magic_number_from_path() at xarray/core/utils.py:664 \n",
"def read_magic_number_from_file() at xarray/core/utils.py:650 \n",
"def get_backend() at xarray/backends/plugins.py:200 \n",
"def open_dataset() at xarray/backends/netCDF4_.py:624 \n",
"def is_remote_uri() at xarray/core/utils.py:641 \n",
"def open() at xarray/backends/netCDF4_.py:361\n",
"
def open_dataset() at xarray/backends/api.py:391 \n",
"def get_backend() at xarray/backends/plugins.py:200 \n",
"def open_dataset() at xarray/backends/h5netcdf_.py:383 \n",
"def is_remote_uri() at xarray/core/utils.py:641 \n",
"def open() at xarray/backends/h5netcdf_.py:135 \n",
"def increment() at xarray/backends/file_manager.py:307 \n",
"def ds() at xarray/backends/h5netcdf_.py:193 \n",
"def acquire_context() at xarray/backends/file_manager.py:196 \n",
"def acquire_context() at xarray/backends/file_manager.py:196 \n",
"def find_root_and_group() at xarray/backends/common.py:141\n",
"
PermanentRedirect
<xarray.Dataset> Size: 8MB\n", "Dimensions: (lat: 89, lon: 180, time: 128)\n", "Coordinates:\n", " * lat (lat) float32 356B 88.0 86.0 84.0 82.0 ... -82.0 -84.0 -86.0 -88.0\n", " * lon (lon) float32 720B 0.0 2.0 4.0 6.0 8.0 ... 352.0 354.0 356.0 358.0\n", " * time (time) datetime64[ns] 1kB 2010-01-01 2010-02-01 ... 2020-08-01\n", "Data variables:\n", " sst (time, lat, lon) float32 8MB ...\n", "Attributes: (12/37)\n", " climatology: Climatology is based on 1971-2000 SST, Xue, Y....\n", " description: In situ data: ICOADS2.5 before 2007 and NCEP i...\n", " keywords_vocabulary: NASA Global Change Master Directory (GCMD) Sci...\n", " keywords: Earth Science > Oceans > Ocean Temperature > S...\n", " instrument: Conventional thermometers\n", " source_comment: SSTs were observed by conventional thermometer...\n", " ... ...\n", " creator_url_original: https://www.ncei.noaa.gov\n", " license: No constraints on data access or use\n", " comment: SSTs were observed by conventional thermometer...\n", " summary: ERSST.v5 is developed based on v4 after revisi...\n", " dataset_title: NOAA Extended Reconstructed SST V5\n", " data_modified: 2020-09-07
<xarray.Dataset> Size: 8MB\n", "Dimensions: (lat: 89, lon: 180, time: 128)\n", "Coordinates:\n", " * lat (lat) float32 356B 88.0 86.0 84.0 82.0 ... -82.0 -84.0 -86.0 -88.0\n", " * lon (lon) float32 720B 0.0 2.0 4.0 6.0 8.0 ... 352.0 354.0 356.0 358.0\n", " * time (time) datetime64[ns] 1kB 2010-01-01 2010-02-01 ... 2020-08-01\n", "Data variables:\n", " sst (time, lat, lon) float32 8MB -1.8 -1.8 -1.8 -1.8 ... nan nan nan\n", "Attributes: (12/37)\n", " climatology: Climatology is based on 1971-2000 SST, Xue, Y....\n", " description: In situ data: ICOADS2.5 before 2007 and NCEP i...\n", " keywords_vocabulary: NASA Global Change Master Directory (GCMD) Sci...\n", " keywords: Earth Science > Oceans > Ocean Temperature > S...\n", " instrument: Conventional thermometers\n", " source_comment: SSTs were observed by conventional thermometer...\n", " ... ...\n", " creator_url_original: https://www.ncei.noaa.gov\n", " license: No constraints on data access or use\n", " comment: SSTs were observed by conventional thermometer...\n", " summary: ERSST.v5 is developed based on v4 after revisi...\n", " dataset_title: NOAA Extended Reconstructed SST V5\n", " data_modified: 2020-09-07
<xarray.Dataset> Size: 8MB\n", "Dimensions: (lat: 89, lon: 180, time: 128)\n", "Coordinates:\n", " * lat (lat) float32 356B 88.0 86.0 84.0 82.0 ... -82.0 -84.0 -86.0 -88.0\n", " * lon (lon) float32 720B 0.0 2.0 4.0 6.0 8.0 ... 352.0 354.0 356.0 358.0\n", " * time (time) datetime64[ns] 1kB 2010-01-01 2010-02-01 ... 2020-08-01\n", "Data variables:\n", " sst (time, lat, lon) float32 8MB ...\n", "Attributes: (12/37)\n", " climatology: Climatology is based on 1971-2000 SST, Xue, Y....\n", " description: In situ data: ICOADS2.5 before 2007 and NCEP i...\n", " keywords_vocabulary: NASA Global Change Master Directory (GCMD) Sci...\n", " keywords: Earth Science > Oceans > Ocean Temperature > S...\n", " instrument: Conventional thermometers\n", " source_comment: SSTs were observed by conventional thermometer...\n", " ... ...\n", " creator_url_original: https://www.ncei.noaa.gov\n", " license: No constraints on data access or use\n", " comment: SSTs were observed by conventional thermometer...\n", " summary: ERSST.v5 is developed based on v4 after revisi...\n", " dataset_title: NOAA Extended Reconstructed SST V5\n", " data_modified: 2020-09-07