Much of the data analysis I am interested in involves public transportation, especially the Chicago Transit Authority (CTA) and data collected from the CTA’s Bus Tracker APIs. The Bus Tracker APIs are a collection of APIs that provide access to the near-real-time locations and estimated arrival times of CTA buses. The APIs also provided access to data detailing the paths of all of the CTA’s bus routes, the geospatial location of bus stops, and more.
In my next post, I plan to introduce one of the major projects I am working on that involves analyzing bus location data from the APIs. Because I anticipate many future posts will focus on this data, I’m writing this post to provide a brief overview of both the CTA and Bus Tracker terminology. I hope for this post to establish a common language that readers can reference as needed, so that I do not have to continually redefine the same terms in each subsequent post. For a more thorough understanding of the APIs, feel free to read the documentation.
getvehicles
The majority of the data in my project comes from the Bus Tracker's getvehicles
API. A request to getvehicles
returns the vehicle information (including the locations) of all active CTA buses on up to 10 routes at once.
For example, the request http://www.ctabustracker.com/bustime/api/v2/getvehicles?key=APIKEY&rt=56&format=json
returns the following response:
{
"bustime-response": {
"vehicle": [
{
"vid": "8283",
"tmstmp": "20190120 18:27",
"lat": "41.96198637094071",
"lon": "-87.75562491345761",
"hdg": "328",
"pid": 1970,
"rt": "56",
"des": "Jefferson Park Blue Line",
"pdist": 48406,
"dly": false,
"tatripid": "1079547",
"tablockid": "56 -408",
"zone": ""
},
{
"vid": "8293",
"tmstmp": "20190120 18:26",
"lat": "41.938211582325124",
"lon": "-87.722391905608",
"hdg": "129",
"pid": 1971,
"rt": "56",
"des": "Washington/Michigan",
"pdist": 16288,
"dly": false,
"tatripid": "1079554",
"tablockid": "56 -451",
"zone": ""
}
]
}
}
Note the response has been edited for brevity. I will walk through some of the fields in the response, as well as a couple other terms not included.
A pattern is a unique sequence of stops and waypoints an active bus visits during its service. Each pattern is assigned a unique pattern ID or pid.
A stop is the location along a pattern where a bus picks up and drops off passengers. Each stop is assigned a stopid. A waypoint is an intermediate point along a pattern, such as where a bus turns down a different street. Stops and waypoints are not included in the response to getvehicles
.
A route (rt) is a set of one or more patterns that forms a single service. For example, service on Route 56 Milwaukee, shown in the sample response, consists of six possible patterns:
Jefferson Park Transit Center to Madison/Wabash (southbound), Jeff Park to Milwaukee/Kedzie (southbound), Milwaukee/Addison to Ogilvie (southbound), Madison/Wabash to Jeff Park (northbound), Milwaukee/Kedzie to Jeff Park (northbound), and Madison/Wabash to Ogilvie Station (northbound).
The pattern distance (pdist) gives the linear distance in feet a bus has traveled along the pattern it is executing. Bus locations are also given as latitude (lat) and longitude (lon) coordinates.
The timestamp (tmstmp) reports the last time the vehicle updated its position.
A bus executing a pattern is making a trip. Each trip is designated a tatripid (transit authority trip id). I have yet to determine if there is a pattern to the way tatripids are assigned. I know that tatripids are not unique across bus routes. So, Route 1 Bronzeville/Union Station and Route 2 Hyde Park Express could in theory make a trip at some point with the same tatripid. They are also not unique on a single route from day to day. Route 3 King Drive could make a trip today and tomorrow that are both assigned the same tatripid. I’ve also seen the same tatripid assigned to two different trips on the same route in the same day, provided each trip executes a different pattern. The non-uniqueness of tatripids makes it slightly tricky to group together raw data belonging to different trips, especially for buses with Owl Service (24-hour service).
The remaining fields are not important to the analysis I do, but feel free to read about them in the documentation.
getpatterns
Another important API from Bus Tracker is getpatterns
. This API provides the location of each stop and waypoint for a given pattern. The locations are given as latitude and longitude coordinates, as well as in terms of their pdist.
The request http://www.ctabustracker.com/bustime/api/v2/getpatterns?key=APIKEY&pid=1971&format=json
returns the following response:
{
"bustime-response": {
"ptr": [
{
"pid": 1971,
"ln": 52672.0,
"rtdir": "Southbound",
"pt": [
{
"seq": 1,
"lat": 41.969575000001,
"lon": -87.761570000002,
"typ": "S",
"stpid": "14101",
"stpnm": "Jefferson Park Transit Center",
"pdist": 0.0
},
{
"seq": 2,
"lat": 41.968860000001,
"lon": -87.761665,
"typ": "S",
"stpid": "3725",
"stpnm": "Milwaukee & Higgins",
"pdist": 469.0
},
{
"seq": 3,
"lat": 41.967627999999,
"lon": -87.760632,
"typ": "S",
"stpid": "17175",
"stpnm": "Milwaukee & Lawrence",
"pdist": 943.0
},
]
}
]
}
}
Note the response has been edited for brevity. Most of the fields are self-explanatory. ln gives the total length of the pattern in feet. pt is a list of points—stops and waypoints—that make up the pattern. For each stop or waypoint, in addition to its location, we are given its sequence (seq) in the pattern, the type (typ) of point it is (with possible values “S” for “stop” and “W” for “waypoint”), and the stop name (stpnm) and stop id (stpid) if the point happens to be a stop. For whatever reason, all waypoints have a pdist of 0 feet.
Other data sources: GTFS feed
The CTA also provides scheduled service data via General Transit Feed Specification (GTFS). There are differences between the information provided by the APIs and the GTFS feed—CTA’s FAQ notes that routes are called “patterns” in the APIs, but are called “shapes” in the GTFS feed. One thing worth noting is that the tatripids assigned in the APIs and the trip_ids in the GTFS feed are completely different. I have not been able to figure out if there is way to relate an active trip in the API to a scheduled trip in the GTFS feed. Because of it this, there is an added challenge when trying to compare scheduled service with the performance of bus service in practice. This is a data analysis project I hope to work on the future.