Mentoring with ChiPy (Part 1)

Posted on: Tue 01 October 2019

A little over two years ago, I was a mentee in the Spring 2017 ChiPy mentorship program. One of the most active Python user groups in the world, ChiPy organizes a free, 13-week-long, one-on-one mentorship program twice each year. Individuals of all skill levels are welcome to participate. During the program, mentees learn Python programming best-practices while working on a project of personal interest. It was through the ChiPy mentorship that I began learning Python in earnest and started my still on-going CTA Bus Data Analysis project. Since participating in the program, I've continued to develop my Python and data …

Computing the Voronoi diagram of points on a street network

Posted on: Fri 13 September 2019

Tags: #python, #geopandas, #gis, #data visualization

Recently, I created an Edge Network Voronoi Diagram that partitions Chicago's walkable network into sets of street segments with a shorter walking distance to a particular 'L' station than any other. In other words, given a location along a street, alley, park pathway, etc. somewhere in Chicago, you can find the nearest 'L' station to that point by foot. Click here to view the final visualization over an interactive map of Chicago. Read the repository's README to learn more about the project, including issues and plans for future work. In this blog post, I will explain how I computed the Edge Network Voronoi Diagram using Python.

Building a hexagonal cartogram with Python

Posted on: Sat 07 September 2019

Tags: #python, #geopandas, #gis, #data visualization

This post and any related projects to come are inspired by Ralph Straumann's blog post about creating a hexagonal cartogram to visualize the population of Swiss cantons and the Guardian's use of a hexagonal cartogram to display the 2017 U.K. General Election results. Both maps are aesthetically pleasing and a clever way of visualizing the underlying data, so naturally I wanted to come up with an easy way to create my own! In his blog post, Straumann describes the steps for prepping the geospatial data for the cartogram. His workflow relies partly on ArcGIS, so I wanted to see how much of it I could translate into a reusable workflow with Python. As a proof of concept, I created a hexagonal cartogram of the United States with the size of each state rescaled in proportion to the size of its congressional delegation

Finding area of intersection with GeoPandas

Posted on: Mon 14 January 2019

Tags: #python, #pandas, #geopandas, #data science, #gis

A friend recently asked me if there were an easy way to determine which Chicago ward someone lives in given their ZIP Code. The sign-in sheets at an event she helped plan solicited event-goers' ZIP Codes. Later on she decided she wanted to follow up with attendees about an upcoming aldermanic candidate forum, but only with those attendees living in a ward where one of the speaking candidates was running. I answered her question with disappointing but expected news: because Chicago ward's are so gerrymandered, it's not easy to guess which ward someone lives in based on their ZIP Code. A single ZIP Code could have 4 or more wards twisting through it.

Exploring urban tree data

Posted on: Mon 07 January 2019

Tags: #python, #pandas, #geopandas, #data science, #trees

The New York City Parks Department maintains an interactive Street Tree Map that details every tree growing under NYC Parks jurisdiction as identified by a team of volunteers in 2015. The map is both impressive and thorough and even allows users to create an account where they can favorite trees and record their stewardship activities. Unfortunately, the city of Chicago does not maintain a similar map or publicly available dataset. On a smaller scale, the University of Chicago in Hyde Park published an online database from a tree inventory conducted on their campus in Autumn 2015. The tree inventory is published as a searchable and filterable map. UChicago's map is not as nice as NYC's Street Tree Map: among other reasons, it's slow, cumbersome to navigate, and not convenient for conducting data analysis. With a little work, however, the data can be scraped for our own perusal.

Splitting CSV files by column values with pandas

Posted on: Sun 30 December 2018

Tags: #python, #pandas, #data science

In my first real post, I'm going to share a construct that I use often. Below is a recipe for splitting delimited text files into separate files based on a chosen column's values using pandas. Suppose for example, you had a dataset of all reported Bigfoot sightings in the US over a 50-year period, and that one of the columns in the dataset listed the state where the sightings occurred. The following script could split the dataset into separate files for each state with a Bigfoot sighting.

First, load the data. Change the delimeter as needed.

import pandas as pd …

Post #0: Background Knowledge

Posted on: Sat 29 December 2018

Tags: #personal, #python, #data science

I have been gradually developing skills as a programmer and data scientist since graduating from college two years ago. I started by teaching myself Python in my free time, because I thought it would be useful to know. Many people I knew were using it after all! My learning was initially haphazard. I learned and understood basic Python techniques, but my knowledge felt purposeless. It wasn’t until I tried to solve a major problem from my day-to-day life with Python (a story I'll share later) that my knowledge gained purpose. By applying Python to something personal, my Python skills …