Archive

Building a hexagonal cartogram with Python

This post and any related projects to come are inspired by Ralph Straumann's blog post about creating a hexagonal cartogram to visualize the population of Swiss cantons and the Guardian's use of a hexagonal cartogram to display the 2017 U.K. General Election results. Both maps are aesthetically pleasing and a clever way of visualizing the underlying data, so naturally I wanted to come up with an easy way to create my own! In his blog post, Straumann describes the steps for prepping the geospatial data for the cartogram. His workflow relies partly on ArcGIS, so I wanted to see how much of it I could translate into a reusable workflow with Python. As a proof of concept, I created a hexagonal cartogram of the United States with the size of each state rescaled in proportion to the size of its congressional delegation. I describe the process for making this map below.

You can view the demo visualization here.

Cartograms

First off, what is a cartogram? And what is up with the hexagons? A cartogram is a map in which the feature geometry is distorted in proportion to some non-spatial variable, like population or alcohol consumption per capita. Cartograms come in a variety of styles, ranging from contiguous cartograms with preserved topology but melted-looking features to topologically distorted non-contiguous cartogram which maintain more recognizable shapes.

Inspired by Straumann and the Guardian, I created a contiguous cartogram discretized by hexagons to alleviate the occasional DalĂ­-esque horror caused by the Gastner-Newman diffusion-based method used to generate it. I also think hexagons are an aesthetically pleasing shape and give the cartograms an artistic quality.

Data preparation

Let's dive right into the data processing steps. You will need the following Python packages in order to process the data: NumPy, pandas, GeoPandas, and Shapely. Additionally, I used several command line utilities as well as QGIS to hand-edit the eventual output file, though these tools are optional.

To create the map, I used the following two data sets:

  • U.S. state boundaries with the lowest-available level of detail from the U.S. Census Bureau
  • The size of each state's congressional delegation, which I copied from Wikipedia. Note: I added DC's delegation to Virginia's, since later I simplified the geospatial data so much that DC is removed from the map.
In [2]:
%%bash
curl 'https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_20m.zip' -o cb_2018_us_state_20m.zip
unzip -o cb_2018_us_state_20m.zip
Archive:  cb_2018_us_state_20m.zip
  inflating: cb_2018_us_state_20m.shp.ea.iso.xml  
  inflating: cb_2018_us_state_20m.shp.iso.xml  
  inflating: cb_2018_us_state_20m.shp  
  inflating: cb_2018_us_state_20m.shx  
  inflating: cb_2018_us_state_20m.dbf  
  inflating: cb_2018_us_state_20m.prj  
 extracting: cb_2018_us_state_20m.cpg  
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  186k    0  186k    0     0  95609      0 --:--:--  0:00:02 --:--:-- 80007
In [3]:
%matplotlib inline
import geopandas as gpd

gpd.read_file("cb_2018_us_state_20m.shp").plot()
Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0x137f2b70>

Using the command-line tool svg2json, I converted the shapefile to GeoJSON and then applied geoproject to project the output to Albers USA, a U.S.-centric composite projection that insets Alaska and Hawaii below the Southwestern states. Read Mike Bostock's Command-Line Cartography to learn how to install and use these and related tools.

In [5]:
%%bash
shp2json cb_2018_us_state_20m.shp | geoproject 'd3.geoAlbersUsa()' > us-state-20m-albers.geojson
In [6]:
gpd.read_file("us-state-20m-albers.geojson").plot()
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x13976978>

Don't be alarmed that the map is "upside down". The projected geometry uses SVG coordinates, so the y-axis is reversed compared to Cartesian coordinates. My goal is to eventually create an SVG map with D3 to display online, so this is fine. If the orientation bothers you, you can always reverse the y-coordinates with geoproject like so:

In [7]:
%%bash
geoproject 'd3.geoIdentity().reflectY(true)' < us-state-20m-albers.geojson > flipped.geojson
In [8]:
gpd.read_file("flipped.geojson").plot()
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x139bd4a8>

I loaded geospatial data into a GeoDataFrame and the apportionment data into a plain ol' pandas DataFrame and joined the two together.

In [10]:
import pandas as pd

states = gpd.read_file("us-state-20m-albers.geojson")
apportionment = pd.read_csv("apportionment.csv", dtype={"apportionment": int})
merged = states.merge(apportionment, left_on="STUSPS", right_on="state_code")
merged.to_file("us-state-20m-albers-data.geojson", driver='GeoJSON')

The next step is optional: I uploaded the resulting file to Mapshaper and applied topology-aware geometric simplifications in order to smooth out the data. You can also use Mike Bostock's TopoJSON command-line tools to achieve the same effect, but I like the visual feedback from Mapshaper as I adjust the simplification slider. The following is the result of applying Visvalingam simplification at 5% point retention. Notice that DC has disappeared!

In [11]:
gpd.read_file("us-state-20m-albers-ms-5pct.shp").plot()
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x13be1898>

Cartogram creation with Scapetoad

I loaded the geospatial data into ScapeToad, a free and open-source application written in Java that uses the Gastner-Newman diffusion-based algorithm to generate cartograms from a Shapefile input. You will have to play around with the settings to see what produces the best results for your purposes. Once I was satisfied, I loaded the output into a GeoPandas GeoDataFrame.

In [12]:
gdf = gpd.read_file("us-state-20m-albers-ms-5pct-scapetoad.shp")
# I drop the empty record for DC since it was removed during simplification process
gdf.drop(gdf[gdf.STUSPS == "DC"].index, inplace=True)
gdf.plot()
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x13a8ba20>

Creating the Hexagonal Grid

Now come the hexagons. I created a hexagonal grid with Shapely which I then loaded into a GeoDataFrame, so that it could be joined with the ScapeToad output. First, I did a little rough math to determine a decent resolution for the grid. For different unit lengths, I estimated the number of hexagons that would cover the distorted US geometry and the number that would form each congressional district.

In [13]:
import numpy as np

for u in (3, 4, 5, 6):
  hex_area = 3 * np.sin(np.pi / 3) * (u ** 2)
  tot_hex = gdf.area.sum() / hex_area
  print "Unit length: {}".format(u)
  print "Total hexagons: {}".format(tot_hex)
  print "Hexagons per cd: {}\n".format(tot_hex / 435)
Unit length: 3
Total hexagons: 9630.27831839
Hexagons per cd: 22.1385708469

Unit length: 4
Total hexagons: 5417.03155409
Hexagons per cd: 12.4529461014

Unit length: 5
Total hexagons: 3466.90019462
Hexagons per cd: 7.96988550487

Unit length: 6
Total hexagons: 2407.5695796
Hexagons per cd: 5.53464271172

I settled on a grid of regular hexagons with a side length of four. ~12 hexagons per congressional district strikes a good balance between too many hexagons and not enough to produce recognizable state shapes. The resolution you choose will depend on how much detail you want to preserve in your final map. One of the issues I have with many of the hexagonal cartograms being published on political news outlets like FiveThirtyEight and the Daily Kos is that they use too low of a resolution grid for their maps, so that either the shape of the U.S. (DailyKos) or the shape of each state (FiveThrityEight) becomes unrecognizable. What I admire about U.K. election map from the the Guardian is how well it preserves the shape of the U.K. and its constituencies.

To actually generate the grid, I determined the coordinates of each hexagon using known properties of regular hexagons. If your math is a little hazy, Wikipedia is very useful here. The list of coordinates passed to the Polygon object are ordered counter-clockwise starting from the lower left-hand corner of a regular hexagon with two of its side parallel to the x-axis. Each iteration of the outer for-loop produces two slightly staggered but adjacent columns of hexagons.

In [14]:
from shapely.geometry import Polygon

xmin, ymin, xmax, ymax = gdf.total_bounds

unit = 4
a = np.sin(np.pi / 3)
cols = np.arange(np.floor(xmin), np.ceil(xmax), 3 * unit)
rows = np.arange(np.floor(ymin) / a, np.ceil(ymax) / a, unit)

hexagons = []
for x in cols:
  for i, y in enumerate(rows):
    if (i % 2 == 0):
      x0 = x
    else:
      x0 = x + 1.5 * unit

    hexagons.append(Polygon([
      (x0, y * a),
      (x0 + unit, y * a),
      (x0 + (1.5 * unit), (y + unit) * a),
      (x0 + unit, (y + (2 * unit)) * a),
      (x0, (y + (2 * unit)) * a),
      (x0 - (0.5 * unit), (y + unit) * a),
    ]))

grid = gpd.GeoDataFrame({'geometry': hexagons})
grid["grid_area"] = grid.area
grid = grid.reset_index().rename(columns={"index": "grid_id"})

I then performed a set intersection on the ScapeToad output and the hexagonal grid using GeoPandas' overlay function and assigned hexagons to the state with which they had the largest intersection. I also played around with assigning hexagons along the coasts and national border to states based on the size of the intersection, but in the end opted to assign those hexagons to a state if there was any intersection at all.

In [16]:
intersected = gpd.overlay(gdf, grid, how='intersection', use_sindex=True)
intersected["area_"] = intersected.area
max_intersection = intersected.loc[intersected.index.isin(intersected.groupby(["grid_id"]).area_.idxmax())]
tagged_grid = grid.merge(max_intersection[["grid_id", "state_code"]], how="left", on="grid_id")
tagged_grid = tagged_grid.fillna("")
tagged_grid.to_file("grid.geojson", driver="GeoJSON")
In [17]:
hexmap = tagged_grid[~tagged_grid.grid_id.isin(tagged_grid[tagged_grid.state_code == ""].grid_id)]
hexmap.plot()
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x16195860>

Editing the Grid

In the final steps of the hex-cartogram-making process, I manually edited the hexagon assignments with QGIS to ensure each state was allocated the same number of hexagon per congressional district. QGIS is a free and open-source GIS application that supports editing vector data, though it is nowhere near as fancy (or expensive!) as ArcGIS. The automatic hexagon assignment process did a pretty decent job, though it is not perfect, in part because the cartogram creation in ScapeToad isn't perfect either.

I did a little diagnostic check to see how well hexagons were assigned to each state. First, I checked how many hexagons per congressional district each state was assigned, ignoring Alaska and Hawaii since they were assigned a disproportionately large number of hexagons.

In [18]:
hex_per_state = hexmap[["grid_id", "state_code"]].groupby("state_code").count().reset_index()
hex_per_state.rename(columns={"grid_id": "hex_count"}, inplace=True)
diagnostic = hex_per_state.merge(apportionment, on="state_code")
diagnostic["hex_per_cd"] = diagnostic.hex_count / diagnostic.apportionment
AK_HI_mask = ~diagnostic.state_code.isin(["AK", "HI"])
diagnostic[AK_HI_mask].hex_per_cd.describe()
Out[18]:
count    48.000000
mean     13.498554
std       1.877753
min      12.000000
25%      12.666667
50%      13.000000
75%      13.559524
max      22.000000
Name: hex_per_cd, dtype: float64

Not bad! Most states were assigned between 12 and 14 hexagons for each congressional district. A number in this range is the target number of hexagons per CD. Next, I checked the number of hexagons that would need to be added or removed from the cartogram for all states to have exactly 12, 13, or 14 hexagons per CD. This gave me a sense of how much manual work I would need to do in QGIS.

In [19]:
for i in (12, 13, 14):
  col_name = "r-{}".format(i) # "r" for "residual hexagons"
  diagnostic[col_name] = (i * diagnostic.apportionment) - diagnostic.hex_count
  print col_name
  print "Hex to remove: {}".format(diagnostic.loc[AK_HI_mask & (diagnostic[col_name] < 0), col_name].sum())
  print "Hex to Add: {}".format(diagnostic.loc[AK_HI_mask & (diagnostic[col_name] > 0), col_name].sum())
r-12
Hex to remove: -483
Hex to Add: 0
r-13
Hex to remove: -133
Hex to Add: 83
r-14
Hex to remove: -42
Hex to Add: 425

In the end, I allocated 13 hexagons per CD since it required the fewest number of manual edits. The DataFrame below enumerates the number of hexagons to be removed from or added to each state.

In [20]:
diagnostic[["state", "r-13"]].sort_values("r-13")
Out[20]:
state r-13
0 Alaska -44
21 Michigan -28
10 Hawaii -17
8 Florida -15
20 Maine -12
22 Minnesota -10
25 Montana -9
33 New York -8
42 Texas -8
46 Washington -7
17 Louisiana -7
45 Vermont -5
27 North Dakota -4
36 Oregon -4
39 South Carolina -4
37 Pennsylvania -3
31 New Mexico -3
1 Alabama -2
40 South Dakota -1
12 Idaho -1
3 Arizona -1
28 Nebraska -1
35 Oklahoma 0
43 Utah 0
6 Connecticut 0
48 West Virginia 0
47 Wisconsin 0
29 New Hampshire 0
15 Kansas 0
44 Virginia+DC 1
32 Nevada 1
24 Mississippi 1
26 North Carolina 1
7 Delaware 1
11 Iowa 1
49 Wyoming 1
23 Missouri 1
38 Rhode Island 2
14 Indiana 2
18 Massachusetts 3
41 Tennessee 3
2 Arkansas 3
16 Kentucky 5
9 Georgia 5
4 California 5
5 Colorado 7
19 Maryland 7
34 Ohio 8
30 New Jersey 11
13 Illinois 14

After manually editing the grid in QGIS, I reloaded it into a GeoDataFrame and removed the hexagons not assigned to any states. At this point, you could dissolve the hexagons into states, join data to the cartogram, etc.

In [23]:
grid2 = gpd.read_file("grid-edited.geojson")
hexmap = grid2[~grid2.grid_id.isin(grid2[grid2.state_code.isna()].grid_id)]
hexmap.to_file("hexmap.geojson", driver="GeoJSON")

I created a quick final visualization with D3 that shows the hex grid and the state borders, which you can view here! In the future, I plan on adapting this process to create a hex cartogram of U.S. House of Representatives election results. Hopefully the steps I outlined above are not too daunting so you can make your own flashy cartograms too!

Thanks to the Guardian for the inspiration and to Ralph Straumann for outlining his hexagonal cartogram creation process so I could adapt it to Python. Be sure to check out his blog for more great posts about GIS and data visualization.