Don’t believe the hype: Swyft

TLDR: Swyft shows their capacity for the three kinds of lies: lies, damned lies, and statistics.

Earlier this week, startup Swyft hit hard at NextBus, the arrival prediction service for San Francisco’s NextMuni:

NextBus predictions can be inaccurate 40 percent of the time if a bus is 20 minutes or more away, according to a Swyft study, released Thursday. “NextBus accuracy plummets as it tries to predict arrivals further out in time,” Swyft wrote in a summary of its study findings.

Below the fold, I will argue that conclusions reached in the aforementioned ‘study’ are easily debunked. I’m going to discuss why Swyft’s criteria for ‘incorrect’ are wrong from both a rider’s standpoint, as well that as the math behind predictions in laymen’s terms and why their analysis is innumerate. Finally, their post, while focusing on objectivity for their competitors, says nothing about the actual performance of their product.

Disclaimer in 3 parts:

  1. I have spent my entire career in the design, implimentation, and analysis of Real-Time Automatic Vehicle Location data. I am a former NextBus employee, and my current day job involves working on a competing product.
  2. Arguments presented below are common sense and math, and in no way influenced by any employer, past, present, or future.
  3. In all honesty, I’m a little bitter about regurgitations of these ‘findings’ as an impartial study, when clearly Swyft has a lot to gain in an ever growing marketplace of similar apps. Quoting figures based on questionable criteria (“predictions on average are accurate 70{8472c33f139a04d7902a1525cca677786370fef6b48c8e38f5cec86fa878d628} of the time”) is in itself a questionable practice.

From the Rider’s Perspective, This Doesn’t Mean Anything.

As a transit rider, I take issue with three claims in Swyft’s post.

  1. We categorize a given prediction as accurate if the actual arrival time of the vehicle is anywhere between 30 seconds earlier and 4 minutes later than the predicted arrival time…

This metric was applied equally to predictions 1-30 minutes out. But it doesn’t make sense!

By this logic, these make up “accurate” predictions:

  • A bus predicted in 1 minute, but arrived in 5 (a 400{8472c33f139a04d7902a1525cca677786370fef6b48c8e38f5cec86fa878d628} error)
  • A bus predicted in 30 minutes, but arrived in 29:30 (negligable in the real world)

And these are inaccurate:

  • A bus predicted in 1 minute, but arrived in 29 seconds (negligable in the real world)
  • A bus predicted in 30 minutes, but arrived in 29 (a ~3{8472c33f139a04d7902a1525cca677786370fef6b48c8e38f5cec86fa878d628} error)

Put yourself in a rider’s shoes. Should all predictions have the same weight? Does their criteria actually matter? Most riders would argue that the first prediction is actually inaccurate, and that the final one is accurate. Another reader pointed this out on a comment to Swyft’s post, and I agree.

Swyft’s response? “Ultimately, we decided to go with absolute error because we felt it would be easier for the general public to understand.” I translatte that as “ultimately, we decided to go with absolute error because it’ allows us to engage in mathematical sophistry.”

2. We found that when the bus is within 30 minutes from a desired stop, predictions on average are accurate 70{8472c33f139a04d7902a1525cca677786370fef6b48c8e38f5cec86fa878d628} of the time

Based on the criteria above, what real person does this? Do riders actually get upset (or even notice) when a bus predictied in 30 minutes arrives in 29? When I check my phone or look at a NextMuni sign and see an arrival 30 minutes out, I don’t set my clock and step to the curb 29:30 later. I doubt many other riders do; I’m going to check again in 20 minutes.

3. In addition, NextBus predictions become much less accurate during commute hours when riders need them most.

This is arguably not the case. During commute hours, there is more service. Riders need real-time service to be accurate MOST at hours when there is less service, because a missed bus means a much longer wait. Take a look at the number of Muni’s trips that are scheduled to start every hour on a weekday ((calculated from Muni’s GTFS)):

muni_tphIf I’m waiting for a bus or train between 8PM and 6AM, I want precision because buses are running few and far between. High frequency (almost two thirds of Muni’s routes have scheduled headways of 12 minutes or less during the 07:00 hour((Muni GTFS)) )  and reasonable reliability (note that Muni is legendarily unreliable) diminishes the need for accuracy in predictions greater than say, 20 minutes. In fact, if there are more than 3 or 4 arrivals predicted for, the NextMuni site, as well as nearly every real-time app, truncates the list. If there are 3 buses in the next 15 minutes, why should a rider care about the accuracy of the prediction for the 4th?

The Math Behind Predicting Arrivals (in Simple Terms)

Riders aren’t going to see an effect of the magnitude that Swyft claims. What about looking at it from the other direction, letting math and statistics guide the analysis? It doesn’t fit either.

Prediction engines all have the same root, a formula that takes one back to fourth grade math:

[latex]Time_{Arrival} = Distance_{Arrival} \times Speed_{Arrival}[/latex]

The time that is predicted is based upon the distance to the stop and the estimated speed of the arrival.

The equation above is admittedly naive, as it reduces the travel speed to an average value across the whole prediction span. The most common basic algorithm for starting an implimentation of predictions takes into account varying speed for travel. The origin of the speed variable is what sets algorithms apart, and getting into the varieties of how algorithms calculate speeds is several posts in itself.

[latex display=”true”]Time_{Arrival} = \sum_{i=1}^{i=s} (Distance_{i} \times Speed_{i}) [/latex]

But when looking at results, we must acknowledge that ‘prediction’ is a fancy term for ‘guess’, and no guesses are correct 100{8472c33f139a04d7902a1525cca677786370fef6b48c8e38f5cec86fa878d628} of the time. Operations will have some degree of uncertainty in travel time when the rubber hits the road. Variations in traffic, bus operators, and the number of passengers boarding and alighting are some major elements of this uncertainty.

For this reason, when model accuracy is tested, you need to factor in an error term for each distinct piece that is included in that guess. That error term is going to depend heavily characteristics of the transit service (generally speaking, predictions for buses in mixed traffic are going to be less accurate than those for full-on BRT in dedicated lanes, which would still be less accurate than fully automated rail), not just the algorithm itself. And since we’re summing a number of pieces together, this total error increases with the time farther out that prediction is made for.

[latex display=”true”]Time_{Arrival} = \sum_{i=1}^{i=s} (Distance_{i} \times Speed_{i} + \epsilon)[/latex].

This is why the criteria Swyft applied (an equal amount of time for each time period) does not reflect either good mathematical understanding or reality. The amount of error considered reasonable (i.e. the window of time around the prediction) should increase with the distance to prediction.

So What Does it All Mean?

Can NextMuni be improved? The answer is certainly yes. There are issues that Swyft has pointed out regarding predicting  departures from a terminal, for which NextMuni is configured to be optimistic. Lessening that is a question for Muni and NextBus. Whether tis nobler in the mind to suffer the slings and arrows of outrageous terminal performance…

For me, the real lede in Swyft’s post is buried:

The Swyft community is actually quite effective at finding and broadcasting transportation issues that can cause real-time predictions to be inaccurate. While this does not solve prediction accuracy problems, it demonstrates the potential for a broader community to work together to avoid common transit issues and delays.

Ah, “potential!” The post does little to qualify exactly what would be done with the information they collect other than “make predictions better.” I guess that’s the special sauce that they’re trying to cook up. I want to see the improvement in accuracy that their whatever is on their drawing board will bring. And when it comes to evaluation, will Swyft submit their algorithm to the same ridiculous criteria that they have analyzed NextBus under? Specifically, I’d like to see how they handle predicted vehicles in 30 minutes that arrive in 29. My guess is that the criteria will change when that analysis comes.