This post and any related projects to come are inspired by Ralph Straumann's blog post about creating a hexagonal cartogram to visualize the population of Swiss cantons and the Guardian's use of a hexagonal cartogram to display the 2017 U.K. General Election results. Both maps are aesthetically pleasing and a clever way of visualizing the underlying data, so naturally I wanted to come up with an easy way to create my own! In his blog post, Straumann describes the steps for prepping the geospatial data for the cartogram. His workflow relies partly on ArcGIS, so I wanted to see how much of it I could translate into a reusable workflow with Python. As a proof of concept, I created a hexagonal cartogram of the United States with the size of each state rescaled in proportion to the size of its congressional delegation. I describe the process for making this map below.
You can view the demo visualization here.
Cartograms¶
First off, what is a cartogram? And what is up with the hexagons? A cartogram is a map in which the feature geometry is distorted in proportion to some non-spatial variable, like population or alcohol consumption per capita. Cartograms come in a variety of styles, ranging from contiguous cartograms with preserved topology but melted-looking features to topologically distorted non-contiguous cartogram which maintain more recognizable shapes.
Inspired by Straumann and the Guardian, I created a contiguous cartogram discretized by hexagons to alleviate the occasional DalĂ-esque horror caused by the Gastner-Newman diffusion-based method used to generate it. I also think hexagons are an aesthetically pleasing shape and give the cartograms an artistic quality.
Data preparation¶
Let's dive right into the data processing steps. You will need the following Python packages in order to process the data: NumPy, pandas, GeoPandas, and Shapely. Additionally, I used several command line utilities as well as QGIS to hand-edit the eventual output file, though these tools are optional.
To create the map, I used the following two data sets:
- U.S. state boundaries with the lowest-available level of detail from the U.S. Census Bureau
- The size of each state's congressional delegation, which I copied from Wikipedia. Note: I added DC's delegation to Virginia's, since later I simplified the geospatial data so much that DC is removed from the map.
%%bash
curl 'https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_20m.zip' -o cb_2018_us_state_20m.zip
unzip -o cb_2018_us_state_20m.zip
%matplotlib inline
import geopandas as gpd
gpd.read_file("cb_2018_us_state_20m.shp").plot()
Using the command-line tool svg2json, I converted the shapefile to GeoJSON and then applied geoproject to project the output to Albers USA, a U.S.-centric composite projection that insets Alaska and Hawaii below the Southwestern states. Read Mike Bostock's Command-Line Cartography to learn how to install and use these and related tools.
%%bash
shp2json cb_2018_us_state_20m.shp | geoproject 'd3.geoAlbersUsa()' > us-state-20m-albers.geojson
gpd.read_file("us-state-20m-albers.geojson").plot()
Don't be alarmed that the map is "upside down". The projected geometry uses SVG coordinates, so the y-axis is reversed compared to Cartesian coordinates. My goal is to eventually create an SVG map with D3 to display online, so this is fine. If the orientation bothers you, you can always reverse the y-coordinates with geoproject like so:
%%bash
geoproject 'd3.geoIdentity().reflectY(true)' < us-state-20m-albers.geojson > flipped.geojson
gpd.read_file("flipped.geojson").plot()
I loaded geospatial data into a GeoDataFrame
and the apportionment data into a plain ol' pandas DataFrame
and joined the two together.
import pandas as pd
states = gpd.read_file("us-state-20m-albers.geojson")
apportionment = pd.read_csv("apportionment.csv", dtype={"apportionment": int})
merged = states.merge(apportionment, left_on="STUSPS", right_on="state_code")
merged.to_file("us-state-20m-albers-data.geojson", driver='GeoJSON')
The next step is optional: I uploaded the resulting file to Mapshaper and applied topology-aware geometric simplifications in order to smooth out the data. You can also use Mike Bostock's TopoJSON command-line tools to achieve the same effect, but I like the visual feedback from Mapshaper as I adjust the simplification slider. The following is the result of applying Visvalingam simplification at 5% point retention. Notice that DC has disappeared!
gpd.read_file("us-state-20m-albers-ms-5pct.shp").plot()
Cartogram creation with Scapetoad¶
I loaded the geospatial data into ScapeToad, a free and open-source application written in Java that uses the Gastner-Newman diffusion-based algorithm to generate cartograms from a Shapefile input. You will have to play around with the settings to see what produces the best results for your purposes. Once I was satisfied, I loaded the output into a GeoPandas GeoDataFrame
.
gdf = gpd.read_file("us-state-20m-albers-ms-5pct-scapetoad.shp")
# I drop the empty record for DC since it was removed during simplification process
gdf.drop(gdf[gdf.STUSPS == "DC"].index, inplace=True)
gdf.plot()
Creating the Hexagonal Grid¶
Now come the hexagons. I created a hexagonal grid with Shapely which I then loaded into a GeoDataFrame
, so that it could be joined with the ScapeToad output. First, I did a little rough math to determine a decent resolution for the grid. For different unit lengths, I estimated the number of hexagons that would cover the distorted US geometry and the number that would form each congressional district.
import numpy as np
for u in (3, 4, 5, 6):
hex_area = 3 * np.sin(np.pi / 3) * (u ** 2)
tot_hex = gdf.area.sum() / hex_area
print "Unit length: {}".format(u)
print "Total hexagons: {}".format(tot_hex)
print "Hexagons per cd: {}\n".format(tot_hex / 435)
I settled on a grid of regular hexagons with a side length of four. ~12 hexagons per congressional district strikes a good balance between too many hexagons and not enough to produce recognizable state shapes. The resolution you choose will depend on how much detail you want to preserve in your final map. One of the issues I have with many of the hexagonal cartograms being published on political news outlets like FiveThirtyEight and the Daily Kos is that they use too low of a resolution grid for their maps, so that either the shape of the U.S. (DailyKos) or the shape of each state (FiveThrityEight) becomes unrecognizable. What I admire about U.K. election map from the the Guardian is how well it preserves the shape of the U.K. and its constituencies.
To actually generate the grid, I determined the coordinates of each hexagon using known properties of regular hexagons. If your math is a little hazy, Wikipedia is very useful here. The list of coordinates passed to the Polygon object are ordered counter-clockwise starting from the lower left-hand corner of a regular hexagon with two of its side parallel to the x-axis. Each iteration of the outer for-loop produces two slightly staggered but adjacent columns of hexagons.
from shapely.geometry import Polygon
xmin, ymin, xmax, ymax = gdf.total_bounds
unit = 4
a = np.sin(np.pi / 3)
cols = np.arange(np.floor(xmin), np.ceil(xmax), 3 * unit)
rows = np.arange(np.floor(ymin) / a, np.ceil(ymax) / a, unit)
hexagons = []
for x in cols:
for i, y in enumerate(rows):
if (i % 2 == 0):
x0 = x
else:
x0 = x + 1.5 * unit
hexagons.append(Polygon([
(x0, y * a),
(x0 + unit, y * a),
(x0 + (1.5 * unit), (y + unit) * a),
(x0 + unit, (y + (2 * unit)) * a),
(x0, (y + (2 * unit)) * a),
(x0 - (0.5 * unit), (y + unit) * a),
]))
grid = gpd.GeoDataFrame({'geometry': hexagons})
grid["grid_area"] = grid.area
grid = grid.reset_index().rename(columns={"index": "grid_id"})
I then performed a set intersection on the ScapeToad output and the hexagonal grid using GeoPandas' overlay
function and assigned hexagons to the state with which they had the largest intersection. I also played around with assigning hexagons along the coasts and national border to states based on the size of the intersection, but in the end opted to assign those hexagons to a state if there was any intersection at all.
intersected = gpd.overlay(gdf, grid, how='intersection', use_sindex=True)
intersected["area_"] = intersected.area
max_intersection = intersected.loc[intersected.index.isin(intersected.groupby(["grid_id"]).area_.idxmax())]
tagged_grid = grid.merge(max_intersection[["grid_id", "state_code"]], how="left", on="grid_id")
tagged_grid = tagged_grid.fillna("")
tagged_grid.to_file("grid.geojson", driver="GeoJSON")
hexmap = tagged_grid[~tagged_grid.grid_id.isin(tagged_grid[tagged_grid.state_code == ""].grid_id)]
hexmap.plot()
Editing the Grid¶
In the final steps of the hex-cartogram-making process, I manually edited the hexagon assignments with QGIS to ensure each state was allocated the same number of hexagon per congressional district. QGIS is a free and open-source GIS application that supports editing vector data, though it is nowhere near as fancy (or expensive!) as ArcGIS. The automatic hexagon assignment process did a pretty decent job, though it is not perfect, in part because the cartogram creation in ScapeToad isn't perfect either.
I did a little diagnostic check to see how well hexagons were assigned to each state. First, I checked how many hexagons per congressional district each state was assigned, ignoring Alaska and Hawaii since they were assigned a disproportionately large number of hexagons.
hex_per_state = hexmap[["grid_id", "state_code"]].groupby("state_code").count().reset_index()
hex_per_state.rename(columns={"grid_id": "hex_count"}, inplace=True)
diagnostic = hex_per_state.merge(apportionment, on="state_code")
diagnostic["hex_per_cd"] = diagnostic.hex_count / diagnostic.apportionment
AK_HI_mask = ~diagnostic.state_code.isin(["AK", "HI"])
diagnostic[AK_HI_mask].hex_per_cd.describe()
Not bad! Most states were assigned between 12 and 14 hexagons for each congressional district. A number in this range is the target number of hexagons per CD. Next, I checked the number of hexagons that would need to be added or removed from the cartogram for all states to have exactly 12, 13, or 14 hexagons per CD. This gave me a sense of how much manual work I would need to do in QGIS.
for i in (12, 13, 14):
col_name = "r-{}".format(i) # "r" for "residual hexagons"
diagnostic[col_name] = (i * diagnostic.apportionment) - diagnostic.hex_count
print col_name
print "Hex to remove: {}".format(diagnostic.loc[AK_HI_mask & (diagnostic[col_name] < 0), col_name].sum())
print "Hex to Add: {}".format(diagnostic.loc[AK_HI_mask & (diagnostic[col_name] > 0), col_name].sum())
In the end, I allocated 13 hexagons per CD since it required the fewest number of manual edits. The DataFrame
below enumerates the number of hexagons to be removed from or added to each state.
diagnostic[["state", "r-13"]].sort_values("r-13")
After manually editing the grid in QGIS, I reloaded it into a GeoDataFrame
and removed the hexagons not assigned to any states. At this point, you could dissolve the hexagons into states, join data to the cartogram, etc.
grid2 = gpd.read_file("grid-edited.geojson")
hexmap = grid2[~grid2.grid_id.isin(grid2[grid2.state_code.isna()].grid_id)]
hexmap.to_file("hexmap.geojson", driver="GeoJSON")
I created a quick final visualization with D3 that shows the hex grid and the state borders, which you can view here! In the future, I plan on adapting this process to create a hex cartogram of U.S. House of Representatives election results. Hopefully the steps I outlined above are not too daunting so you can make your own flashy cartograms too!
Thanks to the Guardian for the inspiration and to Ralph Straumann for outlining his hexagonal cartogram creation process so I could adapt it to Python. Be sure to check out his blog for more great posts about GIS and data visualization.