Archive > Tag: data science

Understanding CTA’s Bus Tracker APIs

Much of the data analysis I am interested in involves public transportation, especially the Chicago Transit Authority (CTA) and data collected from the CTA’s Bus Tracker APIs. The Bus Tracker APIs are a collection of APIs that provide access to the near-real-time locations and estimated arrival times of CTA buses. The APIs also provided access to data detailing the paths of all of the CTA’s bus routes, the geospatial location of bus stops, and more.

In my next post, I plan to introduce one of the major projects I am working on that involves analyzing bus location data …


Finding area of intersection with GeoPandas

A friend recently asked me if there were an easy way to determine which Chicago ward someone lives in given their ZIP Code. The sign-in sheets at an event she helped plan solicited event-goers' ZIP Codes. Later on she decided she wanted to follow up with attendees about an upcoming aldermanic candidate forum, but only with those attendees living in a ward where one of the speaking candidates was running. I answered her question with disappointing but expected news: because Chicago ward's are so gerrymandered, it's not easy to guess which ward someone lives in based on their ZIP Code. A single ZIP Code could have 4 or more wards twisting through it.


Exploring urban tree data

The New York City Parks Department maintains an interactive Street Tree Map that details every tree growing under NYC Parks jurisdiction as identified by a team of volunteers in 2015. The map is both impressive and thorough and even allows users to create an account where they can favorite trees and record their stewardship activities. Unfortunately, the city of Chicago does not maintain a similar map or publicly available dataset. On a smaller scale, the University of Chicago in Hyde Park published an online database from a tree inventory conducted on their campus in Autumn 2015. The tree inventory is published as a searchable and filterable map. UChicago's map is not as nice as NYC's Street Tree Map: among other reasons, it's slow, cumbersome to navigate, and not convenient for conducting data analysis. With a little work, however, the data can be scraped for our own perusal.


Splitting CSV files by column values with pandas

In my first real post, I'm going to share a construct that I use often. Below is a recipe for splitting delimited text files into separate files based on a chosen column's values using pandas. Suppose for example, you had a dataset of all reported Bigfoot sightings in the US over a 50-year period, and that one of the columns in the dataset listed the state where the sightings occurred. The following script could split the dataset into separate files for each state with a Bigfoot sighting.

First, load the data. Change the delimeter as needed.

import pandas as pd …

Post #0: Background Knowledge

I have been gradually developing skills as a programmer and data scientist since graduating from college two years ago. I started by teaching myself Python in my free time, because I thought it would be useful to know. Many people I knew were using it after all! My learning was initially haphazard. I learned and understood basic Python techniques, but my knowledge felt purposeless. It wasn’t until I tried to solve a major problem from my day-to-day life with Python (a story I'll share later) that my knowledge gained purpose. By applying Python to something personal, my Python skills …