Most data-driven news applications I've encountered follow what I would call The Chicago Crime model, a name lifted from Adrian Holovaty's famous site. Steady streams of government-provided data are repurposed into a flexible interface that allows users to compare disparate sources ("the mashup") and easily localize the information so it can provide particulars to a wide body of users ("the long tail").
It's a brilliant model, the app that launched a 1,000 ships. But it's not the only way to get things done.
In news terms, where minutes matter, it can still require a relatively long time to do. Especially when it comes to data acquisition. Let's face it, if you're using government data as your starting point, the idea of an SOAPy API is laughable. So don't get your hopes up. Goofing around with delicious tags or Flickr photos is fun, but if you want to do something original from the public sector, they're only going to get you so far. You're going to be FOIA'ing, or, if you're lucky, scraping. And then you're going to be cleaning. Especially if you're invested in serving accurate and consistant information. Because if there's a government database out there that's ready to serve, I've yet to see it.
And there's usually not much of a news hook. Look, I appreciate Everyblock and Chicago Crime and that whole style. Hell, I've essentially remodeled my career to emulate them. But when you get down to it, they're essentially built around the idea that umpteen little news hooks ("Someone was robbed in my neighborhood," "A liquor store wants to open up on your block.") will add up to something greater than the sum of their parts. That "hyperlocal" or "long tail" philosophy, to use the parlance of our time, may ultimately be where a lot of us end up, but blockbuster news is still happening and there's no reason all the same tools that made the Chicago Crime successful can't be used to cover the hell out of a big story when it breaks.
I had just such an opportunity last Friday at the L.A. Times. Late in the afternoon, news broke that a commuter train had crashed in the Valley, potentially killing many riders on board. We didn't know how many fatalities to expect, nor how long it would take for their identities released. But we knew that our audience was going to want to know, and as soon as possible. The typical newspaper.com way to handle this sort of thing is to publish a simple list, or "blob of text", when it's available. And then follow up later with a scattershot of obituaries, usually released as they appear in the paper. But, when you think about it in terms of the Holovaty manifesto and the general concept of the Internet, there's really no reason that information couldn't be better collected and presented as a browsable database application. It's a lesson the LA Times learned earlier this year when our ripoff of Adrian's Faces of the Fallen concept reinvigorated the way the paper covers military casualties.
It meant staying late at work on a Friday night, busting ass most of my weekend, and putting more faith in memcached than most IT people are comfortable with, but the result was that when the government finally did cough up the fatality list we were ready to immediately publish it as a linked database that, over time, has been filled in by further reporting to include greater detail, photos, and more than 1,600 user comments, many of them extremely moving. It's a long way from perfect, but it provided some amount of public service, was way ahead of the competition and generated a pretty goodly amount of traffic along the way. The site is called Chatsworth Metrolink Crash.
That's all my long way of saying that I think big events matter and that database journalists shouldn't be afraid to dive in when they happen. Whether it's posting the location of hurricane shelters, letting people know who the hell all those superdelegates are, or connecting survivors following a disaster, there are plenty of obvious opportunities to do our thing. But it's not going to happen if we don't see taking on big news as an opportunity, anticipate things like the next hot Google search term, or have the capability to deploy very very quickly.
I'm a long way from an authority on the whole deal, but I'm stumbling my way through it. And here are a couple things I've learned along the way.
Earlier this month, we released California Schools Guide, a collection of data about public and private schools across the state, at the very moment the government lifted its embargo on this year's scores. I didn't have the newsworthy data in hand until less than 24 hours before it would be publicly released. But by developing the site in advance using the previous year's data as dummy entries, I was able to pre-script the loading of the 2008 data after only a few minor changes to the code. This meant that we were able to get our product out when the news hook dropped, at the same time as the paper was otherwise promoting an investigative story on the topic and the state's propaganda arms were blasting its own message ("Things are getting better! Trust us!").
Let me be clear. The DRY goal of elegence through efficiency is laudable. And, as a guiding principle for development, you probably can't get any better. It is the single point of truth. It's like natural selection, except for awesomeness. But when you're on a tight deadline, and you've already got a code implementation that works, sometimes you JDFWI, Just Don't Fuck With It. Yeah, so maybe you just copied and pasted and introduced a little redundancy. And maybe your css is just a hodgepodge of div's repurposed from other apps. But it works, right? And what's more important, trimming down your code base, or getting the news out ahead of your competition?
For anyone who's already doing this stuff, it probably goes without saying, but Django's admin is really great. As soon as your database models are written, you've instantly got a set of entry forms that are ready to deploy. This is incredibly useful when trying to turn around simple data apps on deadline. For instance, when it came to the Metrolink crash, I was able to get the models and admin up Friday night so that reporters on Metro desk could begin working on entry as I shifted to work on the views and templates.
You can have the greatest app in the world, but if you can't push it out the web ASAP, you're nowhere. If you're going the Chicago Crime route, this isn't as big of a deal. But if you're trying to hit the big news hook, it's utterly essential. And treating big news like you would anything else on your "product schedule" or "iteration cycle" just isn't going to be good enough. You can call it a waterfall, you can call it reckless, you can call it news-driven development.