Computing the Voronoi diagram of points on a street network

Posted on: Fri 13 September 2019

Tags: #python, #geopandas, #gis, #data visualization

Recently, I created an Edge Network Voronoi Diagram that partitions Chicago's walkable network into sets of street segments with a shorter walking distance to a particular 'L' station than any other. In other words, given a location along a street, alley, park pathway, etc. somewhere in Chicago, you can find the nearest 'L' station to that point by foot. Click here to view the final visualization over an interactive map of Chicago. Read the repository's README to learn more about the project, including issues and plans for future work. In this blog post, I will explain how I computed the Edge Network Voronoi Diagram using Python.

Gathering the data¶

First, I downloaded a list of CTA 'L' Stations, which can be found on the Chicago Open Data Portal, and loaded it into a pandas DataFrame. Stations serviced by multiple lines appear multiple times in the dataset, so I removed any duplicative records. The Location column provides the geographical coordinates of each station as a string formatted as "(lat, lon)".

I'm not sure precisely what part of a station the coordinates represent: perhaps its "average" location? Each station in the dataset includes only one location, so the points don't necessarily represent the locations of the station entrances. This is important to keep in mind during the analysis. Some stations have multiple spatially distributed entrances. The Damen and Paulina entrances to the Illinois Medical District station on the Blue Line are half a mile apart, for example. Having easy access to an auxiliary entrance could impact which 'L' station is nearest to you. I plan to address this issue in future versions of the map.

In [1]:

import osmnx as ox
import networkx as nx
import pandas as pd

In [2]:

stations = pd.read_csv("CTA_-_System_Information_-_List_of__L__Stops.csv")
stations.drop_duplicates(subset=['MAP_ID'], inplace=True)
stations["x"] = stations.Location.map(lambda x: float(x.split(",")[1][:-1]))
stations["y"] = stations.Location.map(lambda x: float(x.split(",")[0][1:]))

Next, I downloaded the walkable street network for Chicago and surrounding suburbs using OSMnx, a Python package for downloading street networks, political boundaries, and building footprints from OpenStreetMap's APIs. The package leverages NetworkX to allow you to easily model, visualize, and analyze complex street networks. The author, Geoff Boeing, provides a good introduction to using OSMnx in this blog post.

I downloaded the network for a rectangular region encompassing Chicago and all of the suburban 'L' stations. I also downloaded the networks for specific cities and villages in Cook County, including Chicago, Evanston, Skokie, Lincolnwood, Rosemont, Norridge, Harwood Heights, Forest Park, Oak Park, and Cicero. I computed the initial Voronoi diagram on the larger rectangular region first and then restricted the final network to Chicago and those suburbs. Had I computed the Voronoi diagram on just the municipalities of interest, then the Voronoi cells on the periphery of the network might not be true to life: sometimes the shortest path to the 'L' involves walking on streets outside of the restricted network. I chose to include each of the listed municipalities in my final analysis in order to avoid holes in the diagram (Norridge, Harwood Heights, Lincolnwood) or because it has at least one 'L' station and several nearby. I didn't include O'Hare Airport or Wilmette in the final diagram, even though both places have an 'L' station, since the Voronoi cells around each station would be large, and they wouldn't add much value to the analysis. Most people probably don't access the O'Hare station from the street and the Linden station is fairly isolated in Wilmette.

Downloading all of these networks and generating the corresponding Python objects could take 10-20 minutes, so make yourself a cup of coffee, do the dishes, and come back in a little bit.

In [3]:

GREATER_CHI = ox.graph_from_bbox(42.074942, 41.643919, -87.523984, -87.913758, network_type='walk')
CHI = ox.graph_from_place('Chicago, Illinois', network_type='walk')
EV = ox.graph_from_place('Evanston, Illinois', network_type='walk')
SK = ox.graph_from_place('Skokie, Illinois', network_type='walk')
LW = ox.graph_from_place('Lincolnwood, Illinois', network_type='walk')
RO = ox.graph_from_place('Rosemont, Illinois', network_type='walk')
NO = ox.graph_from_place('Norridge, Illinois', network_type='walk')
HH = ox.graph_from_place('Harwood Heights, Illinois', network_type='walk')
FP = ox.graph_from_place('Forest Park, Illinois', network_type='walk')
OP = ox.graph_from_place('Oak Park, Illinois', network_type='walk')
CI = ox.graph_from_place('Cicero, Illinois', network_type='walk')

In [4]:

FINAL_NETWORK = nx.compose_all([CHI, EV, SK, LW, RO, NO, HH, FP, OP, CI])

Open Street Maps data is constantly being updated thanks to a dedicated community of contributors. The result of a query made now might not be the same in an hour. I saved the network data as a GraphML file, so I can extend or tweak my analysis later on a consistent set of data. Also, it helped not to need to download the data again in the event of a mistake.

In [5]:

ox.save_graphml(GREATER_CHI, "greater_chi.gml")
ox.save_graphml(FINAL_NETWORK, "final_network.gml")

Computing the Node Network Voronoi Diagram¶

Once I gathered all of the necessary data, I used the station coordinates to identify the nearest nodes in the network to each 'L' station. These nodes served as the seeds for the Voronoi partitioning. Since the seeds corresponded to the nodes nearest to the stations, and not the stations themselves, they were sometimes located in an alley or road off of the main street. On the map, this is most evident at the 35th-Bronzeville-IIT Green Line Station. The station's seed is in a nearby parking lot, which causes the cell for the (very nearby) Sox-35th Red Line Station to encroach on the Green Line station's actual entrance. I plan to improve the accuracy of seed selection in future versions of the map.

In [6]:

nearest_nodes = ox.utils.get_nearest_nodes(GREATER_CHI, stations.x, stations.y, method='balltree')

The NetworkX package comes with a number of algorithms, including one to compute the Voronoi cells of a graph! Given a graph and a set of center nodes (seeds), nx.voronoi_cells returns a dictionary mapping center nodes to the set of all nodes in the graph closer to that center node than to any other with respect to the shortest-path distance metric. Shortest paths are calculated using Dijkstra's algorithm. The values of the dictionary form a partition of the nodes of the graph called a Node Network Voronoi Diagram (NNVD).

I eventually want to partition the graph edges (street segments) by nearest 'L' station. That is, I want to compute the Edge Network Voronoi Diagram (ENVD) of my street network. I will explain how to do this in a moment. In anticipation of that goal, I modified the source for nx.voronoi_cells to also return the lengths of the shortest paths between each node and its corresponding center node. I also included the sequence of nodes constituting the shortest paths (geodesics) in the return statement, so I can plot them later.

In [7]:

from networkx.utils import groups

# modification of nx.voronoi_cells to each node in the Voronoi cell centered at 
# `center_nodes`, along with the length of the path to each node
def voronoi_cells(G, center_nodes, weight='weight'):
  # shortest paths from center nodes to every other node
  lengths, paths = nx.multi_source_dijkstra(G, center_nodes, weight=weight)

  # center node from which shortest path originates
  nearest = {n: p[0] for n, p in paths.items()}

  # map center nodes to all nodes closer to it than any other center nodes
  # along with the length of the path to each of those nodes
  cells = groups(nearest)

  # check for unreachable nodes
  unreachable = set(G) - set(nearest)
  if unreachable:
    cells['unreachable'] = unreachable
  return cells, paths, lengths

So first, I computed the Voronoi diagram on the full network.

In [8]:

cells, geodesics, graph_distances = voronoi_cells(GREATER_CHI, list(nearest_nodes), weight='length')

voronoi_cells stores the nodes that are unreachable from any of the given seeds in the cell dictionary under the key "unreachable". I deleted these nodes from the dictionary and the final network. I also filtered the Voronoi cells dictionary so the partitions only contain nodes that are in the final network.

In [9]:

if 'unreachable' in cells:
  FINAL_NETWORK.remove_nodes_from(cells['unreachable'])
  del cells['unreachable']

cells = {cn: [n for n in cell if n in FINAL_NETWORK.nodes()] for cn, cell in cells.items()}

In order to visualize the NNVD, I needed to plot the nodes and color them. To distinguish between cells, I wanted to color them such that all nodes in the same cell have the same color and that no two adjacent cells use the same color. I determined which cells are adjacent to each other by iterating the network edges and checking which cells the to- and from-nodes belonged to. If the nodes were partitioned into different cells, then those two cells are adjacent and should be colored differently. I kept track of cell adjacency by building an adjacency graph.

In [10]:

# invert voronoi mapping from node -> seed
node_to_seed = {n: cn for cn, cell in cells.items() for n in cell}

adjacency = set()
for n1, n2 in FINAL_NETWORK.edges():
  cn1 = node_to_seed[n1]
  cn2 = node_to_seed[n2]

  if cn1 != cn2:
    adjacency.add((cn1, cn2))

adjacency_G = nx.Graph()
adjacency_G.add_edges_from(adjacency)

A successful graph coloring should use as few colors as possible, both for aesthetic and accessibility purposes. Thankfully, NetworkX also has an algorithm for generating graph colorings! The function nx.coloring.greedy_color attempts to color a given graph using as few colors as possible, such that neighboring nodes do not share the same color. The strategy argument determines the order in which the nodes are colored. By default, nx.coloring.greedy_color uses a "largest first" strategy, which colors the nodes with the most connections first.

In [11]:

coloring = nx.coloring.greedy_color(adjacency_G, strategy='smallest_last')

nx.coloring.greedy_color returns a dictionary mapping nodes in the adjacency graph to integers (which can later be assigned colors). Next, I constructed a dictionary mapping each node in the street network to the integer/coloring assigned to its cell.

In [12]:

node_colors = {n: coloring[cn] for cn, cell in cells.items() for n in cell}

OSMnx's plot module has a number of useful functions to help you plot and style your street networks. For example, ox.get_colors returns an n-length list of colors from the given colormap, which I used to assign Hex color codes to integers. Here's a look at the NNVD restricted to just Evanston. The larger nodes correspond to stops along the Purple Line.

In [13]:

cell_colors = ox.get_colors(n=len(set(coloring.values())), cmap='viridis', return_hex=True)
nc = [cell_colors[node_colors[n]] if n in node_colors else 'none' for n in EV.nodes()]
ns = [50 if n in cells else 5 for n in EV.nodes()]
fig, ax = ox.plot_graph(EV, fig_height=12, node_color=nc, node_size=ns, node_zorder=2, edge_color="#000000", edge_alpha=0.2)

Creating the Edge Network Voronoi Diagram¶

Now comes the fun part! The previous steps partitioned and colored the network nodes (i.e. street and alley intersections) based on the nearest 'L' station. That's great, but the diagram would be even better if the street segments were partitioned into cells and colored too. The next steps outline a process for doing just that.

I iterated the edges in the network and inspected their to- and from-nodes. Nodes partitioned into the same cell—which have the same coloring—should have an edge between them also with the same coloring. If the nodes do not belong the same cell, then some fraction of the edge should be colored the same as the to-node, and the other part of the edge should be colored the same as the from-node.

In the latter case, I calculated the point along the edge where the center nodes are equidistant—i.e. where the two partitions meet. At this distance, I cut the edge into two segments. The Shapely docs provide a recipe for a function that cuts a LineString at a specified distance. In order to "cut" the edge in the network graph, I had to delete the original edge and insert two new edges and two new nodes in its place. I added both new nodes at the point at which the original edge was cut, and I added an edge from the original to-node to one of the new nodes, and the other edge from the second new node to the original from-node. As I deleted and added new geometries from the network, I updated the list of nodes in the corresponding Voronoi partition, the sequence of nodes in the geodesic, and the length of the shortest path in the cells, geodesics, and graph_distances dictionaries, respectively.

In [14]:

from shapely.geometry import LineString, Point

# from shapely docs
def cut(line, distance):
  # Cuts a line in two at a distance from its starting point
  if distance <= 0.0 or distance >= line.length:
    return [LineString(line)]
  coords = list(line.coords)
  for i, p in enumerate(coords):
    pd = line.project(Point(p))
    if pd == distance:
      return [
        LineString(coords[:i+1]),
        LineString(coords[i:])]
    if pd > distance:
      cp = line.interpolate(distance)
      return [
        LineString(coords[:i] + [(cp.x, cp.y)]),
        LineString([(cp.x, cp.y)] + coords[i:])]

In [15]:

def update_voronoi_cells(old_nid, new_nid, add_dist):
  cn = node_to_seed[old_nid]
  cells[cn] = cells[cn] + [new_nid]
  geodesics[new_nid] = geodesics[old_nid] + [new_nid]
  graph_distances[new_nid] = graph_distances[old_nid] + add_dist

def new_node(nid, x, y, **kwargs):
  data = {'x': x, 'y': y, 'osmid': nid}
  data.update(kwargs)
  return (nid, data)

def new_edge(to_n, from_n, eid, geometry, length, **kwargs):
  data = {'osmid': eid, 'geometry': geometry, 'length': length}
  data.update(kwargs)
  return (to_n, from_n, data)

In [16]:

delete_edges = []
add_edges = []
add_nodes = []
nodes = FINAL_NETWORK.nodes()
for n1, n2, data in FINAL_NETWORK.edges.data():
  cell1_color = node_colors[n1]
  cell2_color = node_colors[n2]
  if cell1_color == cell2_color:
    data['color'] = cell1_color
  else:
    if 'geometry' in data:
      line = data['geometry']
    else:
      # if edge doesn't have geometry data, construct LineString from node coordinates
      line = LineString([Point(nodes[n1]['x'], nodes[n1]['y']),
                         Point(nodes[n2]['x'], nodes[n2]['y'])])
    cut_dist = (graph_distances[n2] + data['length'] - graph_distances[n1]) * 0.5
    scaled_cut_dist = line.length * (cut_dist / data['length'])
    if scaled_cut_dist <= 0.0:
      data['color'] = cell2_color
    elif scaled_cut_dist >= line.length:
      data['color'] = cell1_color
    else:
      seg1, seg2 = cut(line, scaled_cut_dist)
      new_n1 = -1 * n1 
      new_n2 = -1 * n2
      n2_cut_dist = data['length'] - cut_dist
      
      add_nodes.append(new_node(new_n1, seg1.coords[-1][0], seg2.coords[-1][1]))
      update_voronoi_cells(n1, new_n1, cut_dist)
      add_nodes.append(new_node(new_n2, seg1.coords[-1][0], seg2.coords[-1][1]))
      update_voronoi_cells(n2, new_n2, n2_cut_dist)

      delete_edges.append((n1, n2))
      add_edges.append(new_edge(n1, new_n1, data['osmid'], seg1, cut_dist, color=cell1_color))
      add_edges.append(new_edge(new_n2, n2, -1 * data['osmid'], seg2, n2_cut_dist, color=cell2_color))

Finalizing the map¶

I removed the unwanted edges from the final network graph and added the new nodes and edges.

In [17]:

FINAL_NETWORK.add_nodes_from(add_nodes)
FINAL_NETWORK.remove_edges_from(delete_edges)
FINAL_NETWORK.add_edges_from(add_edges)
print ""

I did the same with the original network, as it is needed later to draw the geodesics between nodes and 'L' stations in some cases.

In [18]:

GREATER_CHI.add_nodes_from(add_nodes)
GREATER_CHI.remove_edges_from(delete_edges)
GREATER_CHI.add_edges_from(add_edges)
print ""

Here's an updated view of the Voronoi diagram! The large black dots are the center nodes of each partition (the 'L' stations). nx.coloring.greedy_color almost found a four-coloring of the graph! Unfortunately, one of the cells was assigned a renegade fifth color.

In [19]:

ns2 = [50 if node in cells else 0 for node in FINAL_NETWORK.nodes()]
ec2 = [cell_colors[c] if c is not None else 'none' for n1, n2, c in FINAL_NETWORK.edges.data('color')]
fig, ax = ox.plot_graph(FINAL_NETWORK, fig_height=20, node_color="#000000", node_size=ns2, node_zorder=2, edge_color=ec2)

Finally, I saved a number of files. I removed unwanted attributes from the graphs to help reduce the file sizes and saved the edited street networks as GraphML files.

In [20]:

remove_attrs = [
    'access',
    'area',
    'bridge',
    'highway',
    'from',
    'junction',
    'key',
    'landuse',
    'lanes',
    'maxspeed',
    'name',
    'oneway',
    'ref',
    'service',
    'to',
    'tunnel',
    'width',
]

for n1, n2, d in FINAL_NETWORK.edges.data():
    for attr in remove_attrs:
        if attr in d:
            del d[attr]
            
for n, d in FINAL_NETWORK.node.data():
    for attr in remove_attrs:
        if attr in d:
            del d[attr]

In [21]:

ox.save_graphml(GREATER_CHI, "greater_chi_edited.gml")
ox.save_graphml(FINAL_NETWORK, "final_network_edited.gml")

I saved the Voronoi diagram as a GeoJSON, first converting the graph to a GeoDataFrame and dissolving the individual edge geometries into a single feature for each color in order to reduce the file size. The finished visualization renders the GeoJSON with D3/Leaflet.

In [22]:

n_gdf, e_gdf = ox.graph_to_gdfs(FINAL_NETWORK)
dissolved = e_gdf.dissolve(by="color").reset_index()

In [23]:

dissolved[["color", "geometry"]].to_file("data/voronoi.geojson", driver="GeoJSON")

I also saved the Voronoi cell data as Python objects with Pickle, so that I can plot and analyze the geodesics later.

In [24]:

import pickle

for data, fn in ((cells, "cells"), (geodesics, "geodesics"), (graph_distances, "graph_distances")):
  with open("data/{}.pkl".format(fn), "w") as f:
    pickle.dump(data, f)

Hopefully the steps to create an Edge Network Voronoi Diagram are straightforward to follow. Designing an actual process that worked was fairly complicated and took some tinkering, especially determining the best way to color edges crossing between partitions.

I hope to improve this map in the future so that the Voronoi cells are more accurate. In particular, I plan to choose more accurate center nodes for each 'L' station, as well as to add center nodes corresponding to alternate station entrances. I also hope to analyze the size and shape of Voronoi cells to see who in Chicago lives the closest to and further from the L and to see the impact of adding new stations, such as those outlined in the CTA's Red Line Extension project, to transit accessibility.