Building numbers ‘n narrative blogs
Like many others I was glued to fivethirtyeight.com during the election season, and watched the emergence of Nate Silver as the poster boy for a new web publishing genre: “numbers ‘n narrative.” There’s a great interview with him at the Columbia Journalism Review.
There appear to be two types of 3N sites: trendspotting sites and prediction sites. Trendspotting sites monitor some metric, like frequency of term occurrence in the blogosphere, to point out interesting patterns or events as they occur. A good example of this is Matthew Hurst’s Data Mining blog. Prediction sites, of which fivethirtyeight.com is a prime example, are making or enabling a prediction about the outcome of an event based on aggregated data and providing a stream of commentary about the changes in the prediction.
It got me thinking: what is the minimal set of elements that you need for a successful 3N prediction site? One way would be to run a prediction market and blog about it. Another way would be to key off of Nate’s approach and build a predictive model, run it daily, and blog about that.
The approach to prediction at fivethirtyeight.com seems to me, at the first order of approximation, to be something like the following. Multiple state-level polls were aggregated from the Web, combined with a procedure that generated outcomes on a per-state basis using probabilities based on a weighted combinations of polls, and then the resulting state-level outcomes were rolled up into an Electoral College outcome. From that, a variety of visualizations from the Electoral College votes down to fine-grained scenarios and state-level outcomes were coupled with insightful commentary and associated reports from the field.
Abstracting from that, I’ve put together a draft list what would be involved in a generic version of a fivethirtyeight-style site:
- A highly anticipated event (or series of events) with multiple possible outcomes.
- A deterministic way to compute outcomes for the event from a set of inputs.
- A way of aggregating input distributions from the Web.
- Nightly Monte Carlo runs to generate outcome distributions.
- Compelling visualizations for outcome distributions.
- Snappy daily blogging about model predictions and related topics.
Since #6 can be done with your vanilla blogging software, what’s missing is a service that allows you to specify #1-4 and select #5 in the form of widgets that can be embedded in your blog template.
The challenge is to find a way to allow someone who isn’t at the level of expertise of a Nate Silver to specify #1-4. This could be accomplished by narrowing the area of discourse to events for which a Monte Carlo process can be templatized to data source selection. What other topics than electoral politics could be done in this way? I’m thinking things like entertainment awards, sports (where Nate started, of course) and financial market behavior. But I’ll have to knuckle down and scope out some examples in greater detail to be sure that this is really doable.