Recently I was asked about using calculating the amount of scheduled service. This post is the result of a quick experiment to prove the process with PostGIS.
Following along assumes:
- Basic knowledge of SQL and git,
- Basic familiarity with GTFS, and
- A copy of PostgreSQL with the PostGIS extension.
I’m going to use AC Transit for an example, as they have an upcoming service change:
This March, AC Transit is continuing a series of service improvements that began in December. AC Transit’s financial situation has improved and ridership has grown, allowing the District to make modest service increases. In March, the changes primarily involve additional service on weekends.
Highlights of the changes:
The 72R San Pablo Rapid will now operate on Saturdays and Sundays as well as weekdays, helping people get around more quickly from Richmond to Oakland.
With new service on lines 46 and 339, visitors can now take public transit to the Oakland Zoo and Chabot Space and Science Center on weekends.
Line 31 will extend further out on Alameda Point on weekends, giving better access to the area’s growing number of popular destinations.
Staying out later on Saturday nights will be easier in Fremont, where lines 200 and 212 will have extended service until midnight.
Let’s use PostGIS as a way to analyze these changes in service. PostGIS is a spatially enabled database that essentially bridges the gap between a relational database and GIS. This will make sense at the end of the post.
First, install PostGIS, or create and use the OsGeoLive USB stick/ Virtualbox image.
Now, it’s time to grab some data. I used the AC Transit site to get the upcoming change information and the GTFS Data Exchange to grab the penultimate set.
Then, we’ll need to clone my fork (or download the zip) of the GTFS_SQL_Importer. It adds some spatial summary statistics on shapes that are not present in the original. ((If you’ve never used Git before, Sourcetree (Windows, OS X) and Github (Windows, OS X) are great graphical clients to get started with.)) Follow the directions section for spatial data.
The next step is to clone another repo of mine which contains the SQL to perform the actual analysis. The script is sum-kms-of-service.sql.
After loading up the data into two databases, what do we find? Here are the top ten rows from that query run on the weekday schedule, before and after, prettified via a spreadsheet for this post.
Route | Pre-3/2015 | 3/2015 |
Grand Total | 98754 | 98756 |
1 | 3697 | 3697 |
1R | 3687 | 3687 |
18 | 3029 | 3029 |
72R | 2997 | 2997 |
99 | 2972 | 2972 |
40 | 2959 | 2959 |
51A | 2912 | 2912 |
97 | 2612 | 2612 |
57 | 2546 | 2546 |
Save for a negligible 2km increase in the grand total, the weekday numbers didn’t change much. So what’s the deal? Looking back on the original message, the following is clear:
the changes primarily involve additional service on weekends
Knowing that, can we do the same for Saturday? Changing the text after service_id ilike to ‘{8472c33f139a04d7902a1525cca677786370fef6b48c8e38f5cec86fa878d628}sat{8472c33f139a04d7902a1525cca677786370fef6b48c8e38f5cec86fa878d628}’ returns the Saturday schedule.
Route | Pre-3/2015 | 3/2015 |
Grand Total | 55030 | 58672 |
1 | 3345 | 3345 |
57 | 2385 | 2385 |
18 | 2304 | 2304 |
72R | 0 | 2192 |
NL | 2070 | 2070 |
72 | 2060 | 2060 |
40 | 1980 | 1980 |
51A | 1955 | 1955 |
99 | 1904 | 1904 |
72M | 1858 |
Whoa! There’s a fair bit of service increase, and comparing the two tables we can see that the addition of the 72R is a huge boost to the San Pablo corridor. Let’s look at that on one table:
San Pablo Corridor (72*) service, before and after 3/2015 Service Changes
Route | Pre-3/2015 | 3/2015 |
Total 72 Service | 3918 | 6109 |
72R | 2192 | |
72 | 2060 | 2060 |
72M | 1858 | 1858 |
A 56{8472c33f139a04d7902a1525cca677786370fef6b48c8e38f5cec86fa878d628} increase in service! “helping people get around more quickly from Richmond to Oakland” is an understatement. Keep it up ACT!