Catchment Area Analysis
How to calculate demographics of catchment areas surrounding points of interest, like the population living 1 mile from a Starbucks, using mundipy.
Because it’s hard to estimate how many people might visit a particular retail location without observed data, geospatial analysts use catchment areas to approximate how many people could reasonably visit a retail location.
In this tutorial, we’ll learn to calculate the average demographics of catchment areas. We’ll use mundipy with population data from the US census bureau and retail location data (POI) from OpenStreetMap.
Getting Setup
Population Data
We’ll analyze the demographics surrounding various Starbucks coffeeshops in Seattle.
We can download this POI (“points of interest”) data by filtering for name=Starbucks
on OpenStreetMap via Overpass Turbo. You can download this data as starbucks_in_seattle.geojson
from this gist.
U.S. Census data offers free geospatial demographic datasets of varying resolutions. Here, we can use population counts aggregated by zip code (ZCTAs). You can create this on your own with the Census data downloader, or download this data as population_zip_code.fgb
(WARNING: 1GB).
You’re ready to continue with population_zip_code.fgb
and starbucks_in_seattle.geojson
in your root directory.
Installing mundipy
You can install mundipy normally with pip
via its PyPi project:
pip install mundipy
shapely
dependency, this shapely installation guide may be useful in debugging.Datasets
Loading datasets
We’ll start with the Starbucks dataset by creating a Dataset
object and passing it the GeoJSON file location.
from mundipy.dataset import Dataset
starbucks = Dataset('starbucks_in_seattle.geojson')
zip_codes = Dataset('population_zip_code.fgb')
We can check the number of features we loaded with len
:
len(starbucks)
# 128
len(zip_codes)
# 32923
If you’re in a Jupyter notebook, you can leave starbucks
as the last variable in a code
block and it’ll visualize the dataset as an SVG, confirming we have a bunch of points:
.fgb
) aren’t immediately loaded into memory. This drastically reduces memory consumption in this case, as population_zip_code.fgb
is 750MB.Create catchment area
We’ll consider the catchment area of a coffeeshop to be all residents living one mile from it. People are more likely to visit a coffeeshop that’s convenient.
We can take a single location (Point
) and expand it by one mile, or 1,609 meters. This is done with the buffer
geometric operation.
catchment_areas = [ coffeeshop.buffer(1609) for coffeeshop in starbucks ]
This creates a circle for any single catchment area (e.g. catchment_areas[0]
):
We can confirm that the catchment area is one mile in radius by comparing its area in square meters to the area we’d expect from the area of a circle (pi * r^2
):
catchment_areas[0].area / (3.14 * 1609**2)
# 1.012157112539878
ZIP codes in area
We want to get a sum of the population living in the catchment area. We must combine each catchment area with the ZIP codes that intersect with it.
For any given catchment area, we can retrieve a list of the features in a Dataset
that intersects with the region:
# take an example
example_area = catchment_areas[0]
# find all zip codes that intersect
is_touching = zip_codes.intersects(example_area)
# sum population; in this file it's called "universe"
population = sum([ float(zip_code['universe']) for zip_code in is_touching ])
print('total population = %f' % population)
# total population = 125737.000000
However, we are summing area in the ZIP codes that isn’t inside the catchment area. We can check this by calculating the area of is_touching
and comparing it to example_area
:
sum([ t.area for t in is_touching ]) / example_area.area
# 10.50135574070474
Woah, we’re over counting by 10x!
Overlap percentage
Let’s factor in how much a ZIP code overlaps with the catchment area. If 20% of the ZIP code overlaps with the catchment area, we can approximate the population in the catchment area to be 20% of the ZIP code population.
overlap_pct = [ (zip_code.intersection(example_area).area / zip_code.area) for zip_code in is_touching ]
print(overlap_pct)
# [ 0.0126, 0.198, 0.306 ]
Good to know that one of the ZIP codes practically doesn’t overlap. Let’s account for this in our analysis by multiplying the population in that ZIP code by the overlap percentage:
# find all zip codes that intersect
is_touching = zip_codes.intersects(example_area)
# sum population, weighted by overlap
overlap_pct = [ (zip_code.intersection(example_area).area / zip_code.area) for zip_code in is_touching ]
population = sum([ overlap_pct[i] * float(zip_code['universe']) for i, zip_code in enumerate(is_touching) ])
print('total population = %f' % population)
# total population = 14648.251489
Great, a much more reasonable answer. 14.6k people in a one mile radius is comparable to Seattle’s average population density of 9k people per square mile (or 28.5k in a one mile radius).
Looping
We can change this to a for
loop easily and complete the same analysis:
for catchment_area in catchment_areas:
is_touching = zip_codes.intersects(catchment_area)
# sum population, weighted by overlap
overlap_pct = [ (zip_code.intersection(catchment_area).area / zip_code.area) for zip_code in is_touching ]
population = sum([ overlap_pct[i] * float(zip_code['universe']) for i, zip_code in enumerate(is_touching) ])
# assign as property
catchment_area['population'] = population
Saving the results
We can save the results as a GeoJSON:
import json
from mundipy.geometry import dumps
with open('catchment_areas.geojson', 'w') as f:
f.write(json.dumps(dumps(catchment_areas)))
Plotting the results
We can load this file into a GeoJSON exploration tool like geojson.io and visualize the results.
The population falling inside each catchment area is available as a property on every feature:
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-122.14194734935859,
47.63499748134119
],
...
]
]
},
"properties": {
"population": 14648.251488983871,
...
}
}
Further reading
In this tutorial, we calculated the catchment area as a radius around a point of interest.
More accurate catchment areas are often calculated by using travel time, e.g., 10 minutes of walking or 20 minutes of driving time. You can use the isochrone third-party APIs to do more complex calculations like these.