U.S. County-to-County Migration¶

This notebook is derived from the original deck.gl example in JavaScript, which you can see here.

This dataset originally came from the U.S. Census Bureau and represents people moving in and out of each county between 2009-2013.

This also serves as a notebook for day 10 of 30 Day Map Challenge.

Imports¶

import geopandas as gpd
import numpy as np
import pandas as pd
import pyarrow as pa
import requests
import shapely
from matplotlib.colors import Normalize

from lonboard import Map, ScatterplotLayer
from lonboard.experimental import ArcLayer
from lonboard.layer_extension import BrushingExtension

Fetch the data from the version in the deck.gl-data repository.

url = "https://raw.githubusercontent.com/visgl/deck.gl-data/master/examples/arc/counties.json"
r = requests.get(url)
source_data = r.json()

The following cell may be a little hard to follow, but what it's doing is taking the raw data, which represents a graph) of the data and normalizing it to a table structure where each row represents one "arc" between a source and target county.

This is ported from the original JavaScript here.

arcs = []
targets = []
sources = []
pairs = {}

features = source_data["features"]
for i, county in enumerate(features):
    flows = county["properties"]["flows"]
    target_centroid = county["properties"]["centroid"]
    total_value = {
        "gain": 0,
        "loss": 0,
    }

    for to_id, value in flows.items():
        if value > 0:
            total_value["gain"] += value
        else:
            total_value["loss"] += value

        # If number is too small, ignore it
        if abs(value) < 50:
            continue

        pair_key = "-".join(map(str, sorted([i, int(to_id)])))
        source_centroid = features[int(to_id)]["properties"]["centroid"]
        gain = np.sign(flows[to_id])

        # add point at arc source
        sources.append(
            {
                "position": source_centroid,
                "target": target_centroid,
                "name": features[int(to_id)]["properties"]["name"],
                "radius": 3,
                "gain": -gain,
            }
        )
        # eliminate duplicate arcs
        if pair_key in pairs.keys():
            continue

        pairs[pair_key] = True

        if gain > 0:
            arcs.append(
                {
                    "target": target_centroid,
                    "source": source_centroid,
                    "value": flows[to_id],
                }
            )
        else:
            arcs.append(
                {
                    "target": source_centroid,
                    "source": target_centroid,
                    "value": flows[to_id],
                }
            )

    # add point at arc target
    targets.append(
        {
            **total_value,
            "position": [target_centroid[0], target_centroid[1], 10],
            "net": total_value["gain"] + total_value["loss"],
            "name": county["properties"]["name"],
        }
    )

# sort targets by radius large -> small
targets = sorted(targets, key=lambda d: abs(d["net"]), reverse=True)
normalizer = Normalize(0, abs(targets[0]["net"]))

We define some color constants, as well as a color lookup array.

A nice trick in numpy is that if you have a two-dimensional array like:

[
    [166,   3,   3],
    [ 35, 181, 184]
]

you can perform a lookup based on the index to transform data from one dimensionality to another. In this case, we'll use 0 and 1 — the two available indexes of the array's first dimension — to create an array of colors.

So when we call COLORS[colors_lookup] that creates an output array of something like:

[
    [166,   3,   3],
    [ 35, 181, 184],
    [166,   3,   3],
    [166,   3,   3]
]

equal to the number of rows in our dataset. We can then pass this to any parameter that accepts a ColorAccessor.

# migrate out
SOURCE_COLOR = [166, 3, 3]
# migrate in
TARGET_COLOR = [35, 181, 184]
# Combine into a single arr to use as a lookup table
COLORS = np.vstack(
    [np.array(SOURCE_COLOR, dtype=np.uint8), np.array(TARGET_COLOR, dtype=np.uint8)]
)
SOURCE_LOOKUP = 0
TARGET_LOOKUP = 1

brushing_extension = BrushingExtension()
brushing_radius = 200000

Convert the sources list of dictionaries into a GeoPandas GeoDataFrame to pass into a ScatterplotLayer.

source_arr = np.array([source["position"] for source in sources])
source_positions = shapely.points(source_arr[:, 0], source_arr[:, 1])
source_gdf = gpd.GeoDataFrame(
    pd.DataFrame.from_records(sources)[["name", "radius", "gain"]],
    geometry=source_positions,
    crs="EPSG:4326"
)
# We use a lookup table (`COLORS`) to apply either the target color or the source color
# to the array
source_colors_lookup = np.where(source_gdf["gain"] > 0, TARGET_LOOKUP, SOURCE_LOOKUP)
source_fill_colors = COLORS[source_colors_lookup]

Create a ScatterplotLayer for source points:

source_layer = ScatterplotLayer.from_geopandas(
    source_gdf,
    get_fill_color=source_fill_colors,
    radius_scale=3000,
    pickable=False,
    extensions=[brushing_extension],
    brushing_radius=brushing_radius,
)

targets_arr = np.array([target["position"] for target in targets])
target_positions = shapely.points(targets_arr[:, 0], targets_arr[:, 1])
target_gdf = gpd.GeoDataFrame(
    pd.DataFrame.from_records(targets)[["name", "gain", "loss", "net"]],
    geometry=target_positions,
    crs="EPSG:4326"
)
# We use a lookup table (`COLORS`) to apply either the target color or the source color
# to the array
target_line_colors_lookup = np.where(target_gdf["net"] > 0, TARGET_LOOKUP, SOURCE_LOOKUP)
target_line_colors = COLORS[target_line_colors_lookup]

Create a ScatterplotLayer for target points:

target_ring_layer = ScatterplotLayer.from_geopandas(
    target_gdf,
    get_line_color=target_line_colors,
    radius_scale=4000,
    pickable=True,
    stroked=True,
    filled=False,
    line_width_min_pixels=2,
    extensions=[brushing_extension],
    brushing_radius=brushing_radius,
)

Note: the ArcLayer can't currently be created from a GeoDataFrame because it needs two point columns, not one. This is a large part of why it's still marked under the "experimental" module.

Here we pass a numpy array for each point column. This is allowed as long as the shape of the array is (N, 2) or (N, 3) (i.e. 2D or 3D coordinates).

value = np.array([arc["value"] for arc in arcs])
get_source_position = np.array([arc["source"] for arc in arcs])
get_target_position = np.array([arc["target"] for arc in arcs])
table = pa.table({"value": value})

arc_layer = ArcLayer(
    table=table,
    get_source_position=get_source_position,
    get_target_position=get_target_position,
    get_source_color=SOURCE_COLOR,
    get_target_color=TARGET_COLOR,
    get_width=1,
    opacity=0.4,
    pickable=False,
    extensions=[brushing_extension],
    brushing_radius=brushing_radius,
)

Now we can create a map using these three layers we've created.

As you hover over the map, it should render only the arcs near your cursor.

You can modify brushing_extension.brushing_radius to control how large the brush is around your cursor.

map_ = Map(layers=[source_layer, target_ring_layer, arc_layer], picking_radius=10)
map_