The definition of data journalism is at once both painfully simple and frustratingly vague.

“When you say data journalism, it means something different to just about everyone,” said Aron Pilhofer, visual editor at The Guardian.

In his Tow Center paper, Alex Howard offered a more detailed definition for data journalism: “gathering, cleaning, organizing, analyzing, visualizing and publishing data to support the creation of acts of journalism.”

Those who practice it do tend to agree on one principle: data journalism is, first and foremost, journalism. It simply uses data as a source in addition to humans.

Pilhofer and others tend to delineate a few categories, each with its own skills and job descriptions. While these may vary or overlap depending on who you’re talking to, they tend to fall roughly along these lines:

  • Acquisition: Getting data, whether that means scraping a website, downloading a spreadsheet, filing a public records request or some other means
  • Analysis: Doing calculations or other manipulations on data you’ve got, to look for patterns, stories or clues
  • Presentation: Publishing data in an informative and engaging way. Infographics, news apps and web design are all examples of this

Not all of these categories might fit a strict definition of reporting, but they all do constitute journalism, said Sarah Cohen, who leads a data team at The New York Times. Even news app developers who spend their days writing code are journalists, Cohen says, because they’re writing code in order to explain and communicate information to the public.

“The necessary skill for a data journalist is journalism and some interest in data,” Cohen said.

What reporters can do with data that they can’t with traditional reporting

While data brings its own challenges, as we will discuss later, it also offers some opportunities that are impossible or harder to get at in more traditional forms of reporting.

Data allows journalists to more authoritatively verify claims

The clearest advantage data has over other sources is that it’s fact, said Sarah Cohen at The New York Times. It’s an actual counted number of fatalities, for instance, or tax dollars or potholes. There’s not as much need to rely on anecdotal evidence when you have the real evidence in front of you.

Take a story by the Associated Press from earlier this year, which used a congressman’s Instagram account as a source for an investigation. This particular politician had been taking flights on his donors’ private jets, and billing the public for it, suggesting an overly cozy or even illicit relationship with his top donors.

The reporters had found the scoop by comparing the location data on his Instagram posts to public data on flight records. These days, Cohen said, “anything can be data.”

Data allows journalists to tackle bigger stories

With data, size no longer matters: reporters can easily get ahold of information ranging from granular to the global. It might be just as easy to get budgets for every county in the state as it is to get it for just your county, opening up a wealth of new possibilities for exploration.

Hilary Niles said this capacity gives newsrooms an “investigative edge” they wouldn’t have otherwise, especially small or medium-sized newsrooms. Niles works as a data consultant and freelancer in Vermont, advising newsrooms on using data and doing her own freelance reporting.

Back in 1992, Steve Doig, a reporter at the Miami Herald, had to examine millions of building code inspections using a computer program called SAS. His investigation revealed the state had been extraordinarily lax about its inspections. Such a monumental task would have been impossible if his team of reporters had to work only with the inspection reports on paper.

Data makes it easier to find new stories

With data, reporters can suss out patterns and follow up on leads in a way they can’t with verbal stories or anecdotes.

While reporters should still use their journalistic judgment, Jue Yang said, data offers a view that doesn’t lean so heavily on instinct or personal judgment. Yang is a technologist-in-residence at the City University of New York, where she helps shape its innovative Social Journalism program. “Computers are great when it comes to discovering things faster or discovering things you didn’t expect,” she said.

Data enables journalists to better illuminate murky issues

Data can also support or oppose an existing claim, or theory, or even an urban legend. Kuang Keng Kuek Ser is a consultant who coaches small- or medium-sized newsrooms who want to start using data.

Keng shares an example from The Guardian: after a series of riots in the UK caught international attention, the government claimed the riots were unrelated to poverty, and The Guardian wanted to investigate.

Credit: The Guardian

“But the question is, how do we know?” The Guardian wrote in an article explaining their work. “If poverty affects health, education and crime, could it be a factor in the events of last week?”

It’s tough to definitively say, because someone could easily make a claim either way. The Guardian’s solution was to get hold of the police records of everyone arrested in the riot, and map out their home addresses. The reporting compared those addresses to a map of impoverished areas, which it obtained through other public data. In the end, The Guardian found that some of the government’s claims were true, while some were not.

Like Steve Doig, a Miami Herald reporter who found a clear connection between building inspections and hurricane damage, The Guardian used hard numbers to clear up what had been an issue of finger pointing.

“The core of data journalism, on at least the analysis end, is looking for patterns,” Doig said. “The patterns are going to be what tells the story.”

Just as data can illuminate a murky social issue, it can also quantify it, which contributes valuable information to the social discourse.

In 1989, even before Doig was doing his hurricane investigation, reporters in Atlanta were trying to investigate rumors of racial discrimination in bank loans. Using six years’ worth of lender reports, the Atlanta Journal-Constitution was able to show African-Americans were denied bank loans at rates far exceeding those for whites. The paper became one of the first to win a Pulitzer for an investigation using data.

The Atlanta reporters already had anecdotes about racial discrimination, Doig said, but the data allowed them to go beyond that and establish clear patterns – even illuminating the quantity and scale of the problem.

Data can offer detail and distance

Jacob Harris, who now works as an innovation specialist at the General Services Administration’s 18F project, said data allows more capacity for showing the ‘near’ and ‘far’ view of a topic. In other times, he said, a man on the street interview would be the ‘near’ and an expert interview would be the ‘far.’ There’s not so much need to rely solely on expert testimony when data can provide the ‘far’ or ‘macro’ view more precisely.

On the other hand, the scale of the data itself can be overwhelming for the audience. While data on every police force in the United States can offer a “far” view for a story, no reader is actually going to sift through all that information if it’s put in front of them. But the web allows them to “look at their own ‘near,’” Harris said.

Harris gave the example of ProPublica’s “Surgeon Scorecard,” a news app that lets users find data on their own doctor, hospital or town. In this way, ProPublica distills data on tens of thousands of doctors and millions of dollars of Medicare payments into whatever fits each reader.

Data offers the potential to be more transparent

At the same time, there may be a reason to share a huge data set with an audience. Data sources and web technology have made it possible for journalists to be transparent as they never have been before. Reporters can even share how they reached their conclusions, or allow readers to come to their own. “Transparency is the new objectivity” became a saying among journalists. Blogger David Weinberger wrote about it for KMWorld in 2009.

“Outside of the realm of science, objectivity is discredited these days as anything but an aspiration,” he wrote. “If you don’t think objectivity is possible, then presenting information as objective means hiding the biases that inevitably are there. It’d be more accurate and truthful to acknowledge those biases, so that readers can account for them in what they read.”

Bill Kovach and Tom Rosenstiel made a similar case in 2001 in The Elements of Journalism, when they argued that scientific-style transparency was the lost meaning of objectivity.

Today, transparency is a common concept at many organizations. Jeremy Singer-Vine and his data team at BuzzFeed published an investigation earlier this year showing that migrants who came to the U.S. on skilled labor visas were being exploited by their employers. They went on to publish not just the raw data, but the calculations they’d done to reach their findings, allowing their readers to check their work form their own conclusions.

“It’s important to show our work,” Singer-Vine said. “Readers should see where this is coming from and not just trust our word.”

Data can make reporting more efficient

Reporters frequently collect information from the same sources over and over again: building permits, police reports, census surveys. Obtaining and organizing this information can be made infinitely more efficient, even totally automatic, by keying in to the data behind the reports.

Derek Willis, a developer at ProPublica, found himself constantly checking the Federal Election Commission’s website for new campaign filings. He automated this process, bit by bit, until he had a program that checked for new filings every 15 minutes, and alerted him to interesting ones. “I don’t miss a thing,” he said.

A little programming knowledge had made Willis’s task not only more accurate and efficient, but freed up his time for other reporting tasks.

What might a data journalism team look like?

Before ProPublica, Willis worked on the Upshot, a data and analysis blog at The New York Times. The Upshot is one of the Times’ four data teams, which fall roughly along the data journalism categories we discussed earlier: acquisition, analysis and presentation. The presentation side is split into separate teams for visualization and news apps.

Besides the Upshot, which analyzes and presents data in innovative and attention-grabbing ways, the Times has a data visualization team, a news apps team and a computer-assisted reporting team, which works mostly on data acquisition and analysis for investigations.

BuzzFeed has a single data team, consisting of Jeremy Singer-Vine and two other reporters, nested inside its investigative unit. The three team members spend a lot of their time helping on data projects with other parts of the newsroom, such as the science desk.

At The Guardian, a newsroom known for pushing data journalism forward, the data projects team is only two people, who spend most of their time working with other reporters. Dozens of other reporters also use data on the Data Blog and Visuals teams.

Other newsrooms have a single data reporter.

Jaimi Dowdell, a training director at the Investigative Reporters and Editors organization, said having a single data reporter can be challenging because editors want that reporter to be everything: reporter and editor, features and daily writer, trainer and evangelist.

All that’s needed for data journalism is a journalist with a little interest in data.

“I feel like that does set the data person up for failure a little bit,” she said, “because you just can’t [be everything].”

The most successful teams, our reporting suggests, tend to be those that perform some mix of their own stories, collaborations with other reporters, training with other staff members and what Pilhofer called the “evangelism” role: raising the level of data literacy across the newsroom.

What it always boils down to, though, is not the size or caliber of your team – or even the existence of one. All that’s needed for data journalism is a journalist with a little interest in data.

For that reason, editors and publishers shouldn’t necessarily think of a “data journalist” as a unique person who should be headhunted, or even necessarily a separate team that needs to be put together. Outlets that are more successful at maintaining data use in their stories tend to have their reporters incorporate data into what they’re already doing.

“Thinking of it as a complement to everything else, rather than a standalone thing, probably helps,” BuzzFeed data editor Jeremy Singer-Vine said. Data, he emphasized, is just another skill that helps reporters tell stories.

Share with your network

You also might be interested in:

  • Successfully and efficiently marketing your work can be hard, especially for local news teams with limited resources, but marketing yourself to your audience is an essential skill for news organizations to drive revenue and promote sustainability.

  • As news teams begin thinking about their election coverage plans, it may feel like adding more tasks to an already full plate, with a fraction of the staff and resources they once had. But that doesn’t have to mean figuring out how to do more with less — maybe it’s doing less with less.

  • We reached out to Danielle Coffey, the CEO of American Press Institute’s parent corporation, the News/Media Alliance, to learn more about the legal fight for news organizations’ rights with AI.