{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "bce30523",
   "metadata": {},
   "source": [
    "## Installation and set up\n",
    "To run the code locally, install the following tools and libraries at the command line. \n",
    "\n",
    "**Jupyter Notebook, if not already installed**\n",
    "- pip install notebook\n",
    "\n",
    "**Supporting libraries**\n",
    "- pip install pandas\n",
    "- pip install geopandas\n",
    "- pip install shapely\n",
    "  \n",
    "**DuckDB and associated tools to improve its performance in the notebook**\n",
    "- pip install duckdb \n",
    "- pip install jupysql\n",
    "- pip install duckdb-engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "id": "dee1c5a9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# import our toolkit\n",
    "import pandas as pd\n",
    "import geopandas as gpd\n",
    "from shapely import wkt\n",
    "import duckdb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "905569c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# install and load DuckDB extensions to work with spatial data and AWS\n",
    "%sql INSTALL spatial;\n",
    "%sql INSTALL httpfs;\n",
    "%sql LOAD spatial;\n",
    "%sql LOAD httpfs;\n",
    "%sql SET s3_region='us-west-2'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "143d3ff8-f985-489a-ae24-ac55805581ef",
   "metadata": {},
   "source": [
    "The DuckDB documentation offers tips and examples for [running DuckDB queries in Jupyter notebooks](https://duckdb.org/docs/guides/python/jupyter.html). In this example, we're using duckdb-engine and JupySQL. You can also connect to DuckDB natively. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9b581e62-b7ba-4394-b8a4-15284c003389",
   "metadata": {},
   "outputs": [],
   "source": [
    "# load (or reload) jupysql to create SQL cells\n",
    "# no need to import duckdb_engine, JupySQL will auto-detect driver \n",
    "# load (or reload) jupysql Jupyter extension to create SQL cells\n",
    "%reload_ext sql"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "54849525-7d26-4734-b2f1-38c0fe1b1e7c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# configure cell output -> query to Pandas\n",
    "%config SqlMagic.autopandas = True\n",
    "%config SqlMagic.feedback = False\n",
    "%config SqlMagic.displaycon = False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "id": "51aa45e8-2282-4286-872d-e1b25a31bca2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# connection string\n",
    "%sql duckdb:///:memory:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32622561",
   "metadata": {},
   "source": [
    "## Getting Overture data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "721512de",
   "metadata": {},
   "source": [
    "Now we're going to extract data from Overture's base theme for an area along the Gulf Coast. The magic %%sql command turns the notebook cell into a SQL cell and allows us to dump our query results in a Pandas DataFrame. Note: this query take a couple minutes to run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d2cc9e11",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql gulf_water <<\n",
    "SELECT \n",
    "    id, \n",
    "    names.primary AS primary_name,\n",
    "    ST_AsText(ST_GeomFromWKB(geometry)) as geometry\n",
    "FROM \n",
    "    read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=base/type=water/*', filename=true, hive_partitioning=1)\n",
    "WHERE \n",
    "    bbox.xmin >= -91.3994\n",
    "\t\tand bbox.xmax <= -89.3864\n",
    "\t\tand bbox.ymin >= 29.152\n",
    "\t\tand bbox.ymax <= 30.5161\n",
    "\n",
    " "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bfe92a03",
   "metadata": {},
   "outputs": [],
   "source": [
    "gulf_water.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c8f7e066",
   "metadata": {},
   "source": [
    "Before we move on, let's deal with the geometry we pulled out of Overture's GeoParquet file. Geometries in GeoParquet are stored as well-known binary (WKB). In our query, we transformed that geometry into text. The next step is to convert it to a shapely geometry before we create the GeoDataFrame. Here's how we do that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "id": "23e89e37",
   "metadata": {},
   "outputs": [],
   "source": [
    "gulf_water['geometry'] = gulf_water['geometry'].apply(wkt.loads)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "id": "66f2e3bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# dataframe to geodataframe, set crs\n",
    "gulf_water_gdf = gpd.GeoDataFrame(\n",
    "    gulf_water\n",
    "    , geometry='geometry', crs=\"EPSG:4326\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4027629-4c5f-4dd7-8abd-47fc5afbc962",
   "metadata": {},
   "source": [
    "Let's work with just the water polygons and lines, not the points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "id": "2eb3593c-fe76-40d1-98b7-77c6a4a70e44",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Apply a lambda to remove point geometries\n",
    "gulf_water_gdf = gulf_water_gdf[gulf_water_gdf['geometry'].apply(lambda x : x.geom_type!='Point' )]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5666c927",
   "metadata": {},
   "outputs": [],
   "source": [
    "gulf_water_gdf.plot(facecolor=\"#628290\", edgecolor=\"#006064\", lw=0.05)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e804e9ee-0b1a-4d79-9262-66586f860c25",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}