{ "cells": [ { "cell_type": "markdown", "id": "bce30523", "metadata": {}, "source": [ "## Installation and set up\n", "To run the code locally, install the following tools and libraries at the command line. \n", "\n", "**Jupyter Notebook, if not already installed**\n", "- pip install notebook\n", "\n", "**Supporting libraries**\n", "- pip install pandas\n", "- pip install geopandas\n", "- pip install shapely\n", " \n", "**DuckDB and associated tools to improve its performance in the notebook**\n", "- pip install duckdb \n", "- pip install jupysql\n", "- pip install duckdb-engine" ] }, { "cell_type": "code", "execution_count": 69, "id": "dee1c5a9", "metadata": {}, "outputs": [], "source": [ "# import our toolkit\n", "import pandas as pd\n", "import geopandas as gpd\n", "from shapely import wkt\n", "import duckdb" ] }, { "cell_type": "code", "execution_count": null, "id": "905569c7", "metadata": {}, "outputs": [], "source": [ "# install and load DuckDB extensions to work with spatial data and AWS\n", "%sql INSTALL spatial;\n", "%sql INSTALL httpfs;\n", "%sql LOAD spatial;\n", "%sql LOAD httpfs;\n", "%sql SET s3_region='us-west-2'" ] }, { "cell_type": "markdown", "id": "143d3ff8-f985-489a-ae24-ac55805581ef", "metadata": {}, "source": [ "The DuckDB documentation offers tips and examples for [running DuckDB queries in Jupyter notebooks](https://duckdb.org/docs/guides/python/jupyter.html). In this example, we're using duckdb-engine and JupySQL. You can also connect to DuckDB natively. " ] }, { "cell_type": "code", "execution_count": null, "id": "9b581e62-b7ba-4394-b8a4-15284c003389", "metadata": {}, "outputs": [], "source": [ "# load (or reload) jupysql to create SQL cells\n", "# no need to import duckdb_engine, JupySQL will auto-detect driver \n", "# load (or reload) jupysql Jupyter extension to create SQL cells\n", "%reload_ext sql" ] }, { "cell_type": "code", "execution_count": 72, "id": "54849525-7d26-4734-b2f1-38c0fe1b1e7c", "metadata": {}, "outputs": [], "source": [ "# configure cell output -> query to Pandas\n", "%config SqlMagic.autopandas = True\n", "%config SqlMagic.feedback = False\n", "%config SqlMagic.displaycon = False" ] }, { "cell_type": "code", "execution_count": 73, "id": "51aa45e8-2282-4286-872d-e1b25a31bca2", "metadata": {}, "outputs": [], "source": [ "# connection string\n", "%sql duckdb:///:memory:" ] }, { "cell_type": "markdown", "id": "32622561", "metadata": {}, "source": [ "## Getting Overture data" ] }, { "cell_type": "markdown", "id": "721512de", "metadata": {}, "source": [ "Now we're going to extract data from Overture's base theme for an area along the Gulf Coast. The magic %%sql command turns the notebook cell into a SQL cell and allows us to dump our query results in a Pandas DataFrame. Note: this query take a couple minutes to run." ] }, { "cell_type": "code", "execution_count": null, "id": "d2cc9e11", "metadata": {}, "outputs": [], "source": [ "%%sql gulf_water <<\n", "SELECT \n", " id, \n", " names.primary AS primary_name,\n", " ST_AsText(ST_GeomFromWKB(geometry)) as geometry\n", "FROM \n", " read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=base/type=water/*', filename=true, hive_partitioning=1)\n", "WHERE \n", " bbox.xmin >= -91.3994\n", "\t\tand bbox.xmax <= -89.3864\n", "\t\tand bbox.ymin >= 29.152\n", "\t\tand bbox.ymax <= 30.5161\n", "\n", " " ] }, { "cell_type": "code", "execution_count": null, "id": "bfe92a03", "metadata": {}, "outputs": [], "source": [ "gulf_water.head()" ] }, { "cell_type": "markdown", "id": "c8f7e066", "metadata": {}, "source": [ "Before we move on, let's deal with the geometry we pulled out of Overture's GeoParquet file. Geometries in GeoParquet are stored as well-known binary (WKB). In our query, we transformed that geometry into text. The next step is to convert it to a shapely geometry before we create the GeoDataFrame. Here's how we do that." ] }, { "cell_type": "code", "execution_count": 76, "id": "23e89e37", "metadata": {}, "outputs": [], "source": [ "gulf_water['geometry'] = gulf_water['geometry'].apply(wkt.loads)" ] }, { "cell_type": "code", "execution_count": 78, "id": "66f2e3bd", "metadata": {}, "outputs": [], "source": [ "# dataframe to geodataframe, set crs\n", "gulf_water_gdf = gpd.GeoDataFrame(\n", " gulf_water\n", " , geometry='geometry', crs=\"EPSG:4326\"\n", ")" ] }, { "cell_type": "markdown", "id": "a4027629-4c5f-4dd7-8abd-47fc5afbc962", "metadata": {}, "source": [ "Let's work with just the water polygons and lines, not the points." ] }, { "cell_type": "code", "execution_count": 79, "id": "2eb3593c-fe76-40d1-98b7-77c6a4a70e44", "metadata": {}, "outputs": [], "source": [ "#Apply a lambda to remove point geometries\n", "gulf_water_gdf = gulf_water_gdf[gulf_water_gdf['geometry'].apply(lambda x : x.geom_type!='Point' )]" ] }, { "cell_type": "code", "execution_count": null, "id": "5666c927", "metadata": {}, "outputs": [], "source": [ "gulf_water_gdf.plot(facecolor=\"#628290\", edgecolor=\"#006064\", lw=0.05)" ] }, { "cell_type": "code", "execution_count": null, "id": "e804e9ee-0b1a-4d79-9262-66586f860c25", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" } }, "nbformat": 4, "nbformat_minor": 5 }