Insights, tools and research to advance journalism

The challenges and possible pitfalls of data journalism, and how you can you avoid them

Now that we’ve gained some basic data knowledge and are putting it to use, we need to take care not to get entangled in a mistake or misunderstanding. The downsides to data reporting are, for the most part, identical to those of regular reporting: misleading or biased sources, honest error, and so on.

Data, however, does require some bulletproofing not necessary for more traditional projects. This section will address how data presents both new and traditional challenges for journalists.

Ethical concerns for reporting with data

The Associated Press recently announced it would be incorporating data standards into the 2017 AP Stylebook, further signaling data’s formal role in 21st century journalism. Significantly, these standards won’t be limited to language and style, but will include ethical standards.

The ethics standards of journalism – don’t break laws, don’t lie, lessen harm – all still apply to data use, Arizona State University professor Steve Doig said. “The data is just another source,” he said. “It doesn’t absolve you of the same kind of ethical considerations that you’re supposed to be taking.”

Just like stealing mail from a mailbox isn’t ethically acceptable, stealing data from a website isn’t, either.

One of the most important edicts for using data, Doig said, is to not use it out of context. As all the practitioners liked to drive home, you have to think critically about your data. One paper he cited had published some data on infant mortality in their city: the mortality rate was astronomically higher in one low-income neighborhood.

But after publishing it, he said, they found that that neighborhood housed a large teaching hospital, where sick infants were brought from all over the state. You have to do your shoe-leather reporting, Doig said.

[Data] doesn’t absolve you of the same kind of ethical considerations that you’re supposed to be taking.

Talking to the human sources behind the data is a must, practitioners told me. Doig shared an example from his own experience: his outlet had published data showing that a small number of convicted criminals had received no jail time. The reporters later found out from the court clerks that that simply meant the criminals had been assigned community service instead, they just hadn’t entered it into the database.

All data, particularly the kind used for journalism, has its root in human sources. As a result, it is subject to human error, biases and fallacies.

Data also always requires context.

Alberto Cairo, Knight Chair at the University of Miami, recommends going a step further and talking to experts who are experienced in analyzing the data. “It’s not just a matter of asking a couple of researchers some questions while you write a blog post,” he wrote for Nieman Lab. “It’s also a matter of doing your reporting in collaboration with those researchers, as they’re the ones that know the data really well.”

This isn’t a particularly groundbreaking notion: The publications who regularly conduct solid data and investigative journalism nowadays, like ProPublica, work this way on a regular basis.”

Earlier, we noted that one of the advantages of data is it lets reporters process very large sets of information. But when it comes to using data as a source, the single most important thing to remember is that the data comes from and involves human beings. It is unwise, often inaccurate, and potentially unethical to simply obtain numbers and publish them. Former New York Times developer Jacob Harris addressed this in an essay called “Connecting with the dots.”

“It’s super easy to put dots on a map at this point,” Harris said in an interview with API. “It’s easy to forget that they’re still people.”

When Cheryl Phillips worked at the Seattle Times, a devastating mudslide killed 43 people. They could easily and quickly have mapped the houses affected by the disaster, but that could have come across as callous, she said.

“You have to remember there are individuals in those data points,” she said. Even though the Seattle Times published the map a full week after the disaster, it included on-the-ground reporting with photos, profiles and stories of the victims.

Credit: Seattle Times

“We wanted to publish something more fully formed and that helped tell the story of the tragedy in a more sensitive way,” Phillips said.

On the flip side, there are privacy and sensitivity issues in publishing a data set that displays every single individual.

Phillips gave the example of salary databases: while public salaries are public information and therefore liable to be published, journalists should think about whether there’s a journalistic impetus to do so. The Seattle Times, she said, obtained the data but only published newsworthy items like excessive overtime or changes over time.

API’s Jeff Sonderman wrote a piece for Poynter outlining the difference between what journalists can and should publish when it comes to data. A map of gun owners, published by a newspaper in New York, was a case in point.

“Data can be wrong, misleading, harmful, embarrassing or invasive,” Sonderman wrote. “Presenting data as a form of journalism requires that we subject the data to a journalistic process.”

The Guardian’s digital editor Aron Pilhofer scorned what he called the “data porn” that freewheels on the Internet: word clouds, pretty pictures, dots splattered on a map.

“Journalism has to have a nut graf,” Pilhofer said. “A reason for people to care.”

Tips for avoiding disaster

Most of the flaws with data are the same as with human sources, too: error, bias, unreliability, misunderstandings.

The following distill the counsel of data veterans on how to avoid being led astray.

Don’t jump to conclusions

Even after you think you’ve found a trend or a connection, continue to be as skeptical as possible. Think of FiveThirtyEight’s story on Nigerian kidnappings: how could they have avoided it?

Once you’ve got the numbers nailed down, step outside the numbers and look at your findings critically. Could there be any confounding variables, or issues that could cause a change that appears to be caused by something else?

Practitioners suggest investigating data sources for biases, hidden variables, privacy or legality issues, or anything else that could possibly lead you to a wrong conclusion. “It’s easy to believe in the pretty spreadsheet,” Guardian data editor Helena Bengtsson said.

Most say you should also confer with an expert or another person who is familiar with the data. Like Steve Doig’s story on criminals apparently getting off scot-free, there may be something in the data you never thought of. If the stakes are high enough – like if there may be legal liability – Pilhofer suggested sharing an entire finding and body of work with an expert or the source itself.

Investigate the data before you report on it

Aron Pilhofer, at the Guardian, insists journalists should know the data “inside and out” before they analyze it. Don’t assume anything: what the column titles mean, what the outliers are, whether there are any parts missing.

Stanford’s Cheryl Phillips recommended figuring out what she called the “shape of the data”: blanks, outliers, patterns and limitations.

As always, 18F innovation specialist Jacob Harris said, vet the data like you would any human source: “You (would) think, maybe the source has an agenda, maybe the source is lying, maybe the source doesn’t know what they’re talking about,” he said.

He also suggested keeping a detailed log of what you did with the data – what columns you moved, calculations you performed, and so on.

Clearly explain the data to your audience

While data may be something of a miracle source of information, it still has its flaws. Be frank with your viewers or readers about incomplete data, differing interpretations, margin of error or anything else that could affect their understanding of your conclusions. Don’t overstate your case.

When it comes down to it, Pilhofer said, you and your data analysis are your own source. “And you better be right. And that can be kind of scary.”

So many of these pitfalls sound obvious, Harris said, and yet, anyone could easily fall into their traps, even experienced data journalists. “I still think that skepticism and paranoia are the best two things you could have on your side,” he said. “I know I could easily fall into similar mistakes myself.”

Don’t republish conclusions formed by someone else

Reporters should be extremely skeptical, Harris said, of surveys or studies that are given to them with the analysis and conclusions already done. Oftentimes, it’s just a startup or PR company trying to get some exposure.

He wrote about a particularly cringeworthy one in Source: a range of publications ran a very dubious study claiming that Democrats watched more porn than Republicans. The source was a porn website.

Credit: Source

“Remember that skepticism is your truest friend if you want to call yourself a journalist,” he wrote. “It’s not hard to see the flaws in a flimsy study if you are predisposed to contemplate all the ways in which the data is probably bad rather than tacitly accepting it as good.”

Ideally, he said, journalists would obtain the raw data behind each survey and study and do their own analysis, as well as investigating the source.

Most importantly, do the groundwork reporting

All data journalists stressed that you can’t do a single project with just data and not journalism. Hitting the pavement, making phone calls, talking to sources are always necessary.

Freelancer Hilary Niles said when she did her bulletproofing on her public radio story, the state’s disarray when it came to understanding their own data became part of the story. “The conventional reporting required in order to compile the database also revealed real gaps in accountability,” she said. “I think this also illustrates the importance of coupling data reporting with traditional reporting in order to draw the most complete picture possible.”

As always, the key to reporting with data is that it’s simply reporting. And with data becoming ever more omnipresent, it’s no longer something that can be demarcated as a separate method to more old-fashioned reporting.

Luckily, the proliferation of tools and of data itself makes this kind of reporting easier and easier. API’s Strategy Study on encouraging innovation in the newsroom quoted the legendary 20th Century journalist Hodding Carter: “This is the most exciting time ever to be a journalist – if you are not in search of the past.”

Need to Know newsletter

The smart way to start your day

Each morning we scour the web for fresh useful insights in our Need to Know newsletter. Sign up below.

Featured topics

Go deeper on…

Dive deep on everything we produce about these key topics.

Strategy Studies

The best practices for innovation within news organizations

This Strategy Study presents examples and insights about journalism innovation, offering actionable advice and methods to move your journalism and business forward.