In retail and commercial real estate, a catchment area of a store represents the typical area where visitors of the store live.

Because it’s hard to estimate how many people might visit a particular retail location without observed data, geospatial analysts use catchment areas to approximate how many people could reasonably visit a retail location.

Above: catchment area around a Starbucks in Seattle

In this tutorial, we’ll learn to calculate the average demographics of catchment areas. We’ll use mundipy with population data from the US census bureau and retail location data (POI) from OpenStreetMap.

Following along with this tutorial takes about ten minutes.

Getting Setup

Population Data

We’ll analyze the demographics surrounding various Starbucks coffeeshops in Seattle. We can download this POI (“points of interest”) data by filtering for name=Starbucks on OpenStreetMap via Overpass Turbo. You can download this data as starbucks_in_seattle.geojson from this gist.

U.S. Census data offers free geospatial demographic datasets of varying resolutions. Here, we can use population counts aggregated by zip code (ZCTAs). You can create this on your own with the Census data downloader, or download this data as population_zip_code.fgb (WARNING: 1GB).

You’re ready to continue with population_zip_code.fgb and starbucks_in_seattle.geojson in your root directory.

Installing mundipy

You can install mundipy normally with pip via its PyPi project:

pip install mundipy
If you have issues installing the shapely dependency, this shapely installation guide may be useful in debugging.

Datasets

Loading datasets

We’ll start with the Starbucks dataset by creating a Dataset object and passing it the GeoJSON file location.

from mundipy.dataset import Dataset

starbucks = Dataset('starbucks_in_seattle.geojson')
zip_codes = Dataset('population_zip_code.fgb')

We can check the number of features we loaded with len:

len(starbucks)
# 128

len(zip_codes)
# 32923

If you’re in a Jupyter notebook, you can leave starbucks as the last variable in a code block and it’ll visualize the dataset as an SVG, confirming we have a bunch of points:

While the datasets are instantiated, cloud-native formats like FlatGeobuf (.fgb) aren’t immediately loaded into memory. This drastically reduces memory consumption in this case, as population_zip_code.fgb is 750MB.

Create catchment area

We’ll consider the catchment area of a coffeeshop to be all residents living one mile from it. People are more likely to visit a coffeeshop that’s convenient.

We can take a single location (Point) and expand it by one mile, or 1,609 meters. This is done with the buffer geometric operation.

catchment_areas = [ coffeeshop.buffer(1609) for coffeeshop in starbucks ]
All geometric operations are in meters, or square meters, by default.

This creates a circle for any single catchment area (e.g. catchment_areas[0]):

We can confirm that the catchment area is one mile in radius by comparing its area in square meters to the area we’d expect from the area of a circle (pi * r^2):

catchment_areas[0].area / (3.14 * 1609**2)
# 1.012157112539878

ZIP codes in area

We want to get a sum of the population living in the catchment area. We must combine each catchment area with the ZIP codes that intersect with it.

For any given catchment area, we can retrieve a list of the features in a Dataset that intersects with the region:

# take an example
example_area = catchment_areas[0]

# find all zip codes that intersect
is_touching = zip_codes.intersects(example_area)

# sum population; in this file it's called "universe"
population = sum([ float(zip_code['universe']) for zip_code in is_touching ])

print('total population = %f' % population)
# total population = 125737.000000

However, we are summing area in the ZIP codes that isn’t inside the catchment area. We can check this by calculating the area of is_touching and comparing it to example_area:

sum([ t.area for t in is_touching ]) / example_area.area
# 10.50135574070474

Woah, we’re over counting by 10x!

Overlap percentage

Let’s factor in how much a ZIP code overlaps with the catchment area. If 20% of the ZIP code overlaps with the catchment area, we can approximate the population in the catchment area to be 20% of the ZIP code population.

overlap_pct = [ (zip_code.intersection(example_area).area / zip_code.area) for zip_code in is_touching ]

print(overlap_pct)
# [ 0.0126, 0.198, 0.306 ]

Good to know that one of the ZIP codes practically doesn’t overlap. Let’s account for this in our analysis by multiplying the population in that ZIP code by the overlap percentage:

# find all zip codes that intersect
is_touching = zip_codes.intersects(example_area)

# sum population, weighted by overlap
overlap_pct = [ (zip_code.intersection(example_area).area / zip_code.area) for zip_code in is_touching ]
population = sum([ overlap_pct[i] * float(zip_code['universe']) for i, zip_code in enumerate(is_touching) ])

print('total population = %f' % population)
# total population = 14648.251489

Great, a much more reasonable answer. 14.6k people in a one mile radius is comparable to Seattle’s average population density of 9k people per square mile (or 28.5k in a one mile radius).

Looping

We can change this to a for loop easily and complete the same analysis:

for catchment_area in catchment_areas:
  
  is_touching = zip_codes.intersects(catchment_area)
  
  # sum population, weighted by overlap
  overlap_pct = [ (zip_code.intersection(catchment_area).area / zip_code.area) for zip_code in is_touching ]
  population = sum([ overlap_pct[i] * float(zip_code['universe']) for i, zip_code in enumerate(is_touching) ])
  
  # assign as property
  catchment_area['population'] = population

Saving the results

We can save the results as a GeoJSON:

import json
from mundipy.geometry import dumps

with open('catchment_areas.geojson', 'w') as f:
  f.write(json.dumps(dumps(catchment_areas)))

Plotting the results

We can load this file into a GeoJSON exploration tool like geojson.io and visualize the results.

The population falling inside each catchment area is available as a property on every feature:

{
  "type": "Feature",
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          -122.14194734935859,
          47.63499748134119
        ],
        ...
      ]
    ]
  },
  "properties": {
    "population": 14648.251488983871,
    ...
  }
}

Further reading

In this tutorial, we calculated the catchment area as a radius around a point of interest.

More accurate catchment areas are often calculated by using travel time, e.g., 10 minutes of walking or 20 minutes of driving time. You can use the isochrone third-party APIs to do more complex calculations like these.