Using GTFS and PostGIS to calculate levels of scheduled service

Recently I was asked about using calculating the amount of scheduled service. This post is the result of a quick experiment to prove the process with PostGIS.

Recently I was asked about using calculating the amount of scheduled service. This post is the result of a quick experiment to prove the process with PostGIS.

Following along assumes:

  • Basic knowledge of SQL and git,
  • Basic familiarity with GTFS, and
  • A copy of PostgreSQL with the PostGIS extension.

I’m going to use AC Transit for an example, as they have an upcoming service change:

This March, AC Transit is continuing a series of service improvements that began in December. AC Transit’s financial situation has improved and ridership has grown, allowing the District to make modest service increases. In March, the changes primarily involve additional service on weekends.

Highlights of the changes:

The 72R San Pablo Rapid will now operate on Saturdays and Sundays as well as weekdays, helping people get around more quickly from Richmond to Oakland.
With new service on lines 46 and 339, visitors can now take public transit to the Oakland Zoo and Chabot Space and Science Center on weekends.
Line 31 will extend further out on Alameda Point on weekends, giving better access to the area’s growing number of popular destinations.
Staying out later on Saturday nights will be easier in Fremont, where lines 200 and 212 will have extended service until midnight.

Let’s use PostGIS as a way to analyze these changes in service. PostGIS is a spatially enabled database that essentially bridges the gap between a relational database and GIS. This will make sense at the end of the post.

First, install PostGIS, or create and use the OsGeoLive USB stick/ Virtualbox image.

Now, it’s time to grab some data. I used the AC Transit site to get the upcoming change information and the GTFS Data Exchange to grab the penultimate set.

Then, we’ll need to clone my fork (or download the zip) of the GTFS_SQL_Importer. It adds some spatial summary statistics on shapes that are not present in the original. ((If you’ve never used Git before, Sourcetree (Windows, OS X) and Github (Windows, OS X) are great graphical clients to get started with.)) Follow the directions section for spatial data.

The next step is to clone another repo of mine which contains the SQL to perform the actual analysis. The script is sum-kms-of-service.sql.

After loading up the data into two databases, what do we find? Here are the top ten rows from that query run on the weekday schedule, before and after, prettified via a spreadsheet for this post.

Route Pre-3/2015 3/2015
Grand Total 98754 98756
1 3697 3697
1R 3687 3687
18 3029 3029
72R 2997 2997
99 2972 2972
40 2959 2959
51A 2912 2912
97 2612 2612
57 2546 2546

Save for a negligible 2km increase in the grand total, the weekday numbers didn’t change much. So what’s the deal? Looking back on the original message, the following is clear:

the changes primarily involve additional service on weekends

Knowing that, can we do the same for Saturday? Changing the text after service_id ilike to ‘{8472c33f139a04d7902a1525cca677786370fef6b48c8e38f5cec86fa878d628}sat{8472c33f139a04d7902a1525cca677786370fef6b48c8e38f5cec86fa878d628}’ returns the Saturday schedule.

Route Pre-3/2015 3/2015
Grand Total 55030 58672
1 3345 3345
57 2385 2385
18 2304 2304
72R 0 2192
NL 2070 2070
72 2060 2060
40 1980 1980
51A 1955 1955
99 1904 1904
72M 1858

 

Whoa! There’s a fair bit of service increase, and comparing the two tables we can see that the addition of the 72R is a huge boost to the San Pablo corridor. Let’s look at that on one table:

San Pablo Corridor (72*) service, before and after 3/2015 Service Changes

Route Pre-3/2015 3/2015
Total 72 Service 3918 6109
72R 2192
72 2060 2060
72M 1858 1858

A 56{8472c33f139a04d7902a1525cca677786370fef6b48c8e38f5cec86fa878d628} increase in service! “helping people get around more quickly from Richmond to Oakland” is an understatement. Keep it up ACT!