May 2016 – Transit Data Blog

Via O’Reilly, a good short introduction statistical thinking for real world situations, and how to describe results without being misleading (hello, Swyft!).

Common sense tells us that 2 + 2 will always be 4. We can compile that code and run 2 + 2 over and over and the answer will always be 4. But when we try to measure some phenomenon in the real world it’s often impossible to measure everything, so we end up with some slice or sample and we have to accept an approximation. In other words, when observing 2 + 2 in a huge and complex system, it rarely adds up to precisely 4.

Consumers and producers of popular research each have a role to play. Consumers should expect a level of transparency, and researchers should work hard to earn both the trust and attention of the reader. A good popular report should:

State how much data is being analyzed: Using statements like “hundreds of data compromises” isn’t that helpful since the difference in strength between 200 and 900 samples is quite stark. Though it’s much worse when the sample size isn’t even mentioned, and this is a deal breaker. Popular research should discuss how much data was collected.

Describe the data collection effort: Researchers should not hide where their data comes from nor should they be afraid of discussing all the possible bias in their data. This is like a swimmer not wanting to discuss getting wet. Every data sample will have some bias, but it’s exactly because of this that we should welcome every sample we can get and have a dialogue about the perspective being represented.

Define terms and categorization methods: Even common terms like event, incident and breach may have drastically different meaning for different readers. Researchers need to be sure they aren’t creating confusion assuming the reader understands what they’re thinking.

Be honest and helpful: Researchers should remember that many readers will take the results they publish to heart: decisions will be made, driving time and money spent. Treat that power with the responsibility it deserves. Consumers would do well to engage the researchers and reach out with questions. One of the best experiences for a researcher is engaging with a reader who is both excited and willing to talk about the work and hopefully even make it better.

Finally, even though we are really good at public shaming and it’s so much easier to tear down than it is to build up, we need to encourage popular research because even though a research paper has bias from convenience sampling or doesn’t match up with the perspective you’ve been working with, it’s okay. Our ability to learn and improve is not going to come from any one research effort. Instead the strength in research comes from all of the samples taken together. So get out there, publish your data, share your research, and and celebrate the complexity.

Month: May 2016

Stats and Common Sense

So Many Transit Blogs, So Little Time