Mommy, what’s a Protobuf?

A question that continually pops up when someone learns that GTFS-Realtime is intended to be formatted as Protocol Buffers (or Protobuf) is, understandably, “why?”

The documentation does little to answer this question for laypeople.

“Protocol buffers are a language- and platform-neutral mechanism for serializing structured data (think XML, but smaller, faster, and simpler). The data structure is defined in a gtfs-realtime.proto file, which then is used to generate source code to easily read and write your structured data from and to a variety of data streams, using a variety of languages – e.g. Java, C++ or Python.”

What does that mean? Why not JSON (which is smaller, faster and simpler than XML)? I’ve heard a number of complaints that Protobuf is something along the lines of “entirely unnecessary.” But it’s not— it serves a purpose.

In short, Protobuf is designed to be:
* efficiently generated,
* efficiently transferred,
* efficiently processed (machine readable), and
* unambiguously understood by programs written in many languages.

Continue reading “Mommy, what’s a Protobuf?”

Pro Tip: Running FeedValidator and ScheduleViewer faster

The Google-supported transitfeed Python package includes several useful tools, including a Feed Validator and Schedule Viewer… But under Python they’re slow. With a moderately sized feed (Santa Clara VTA), it’s problematic:

python2.7 transitfeed-1.2.15/feedvalidator_googletransit.py vta/ > /dev/null
103.32s user 4.67s system 75% cpu 2:22.28 total

That took 103 seconds to run. Larger feeds take exponentially more time. What to do?

Enter pypy, a faster version of Python– written in Python. The differences are puzzling, but explained here. The same code run under pypy is much faster:

pypy transitfeed-1.2.15/feedvalidator_googletransit.py vta/ > /dev/null
30.48s user 3.05s system 74% cpu 44.928 total

30% of the time to do the same work.