Insights, tools and research to advance journalism

How to get started with data journalism in your newsroom

Once you’ve decided data is something your staff should be able to handle, the question is how to incorporate it into their workflow. Every newsroom is already busy, and many are strapped for funding and staff. This section will address how to train journalists in data journalism while ensuring that it gets folded into the work they’re already doing.

What skills should journalists have?

Across the board, those who practice it told me there are two basic skills needed to get started as a data journalist: the ability to engage in critical thinking and basic familiarity with spreadsheets.

Most reporters already have critical thinking skills. (Although, as we will see in the “challenges” section, they need to learn to apply it to data sources as well as traditional human sources).

In the case of data journalism, it means the ability to treat numbers as skeptically as you would any other source, said Cheryl Phillips, a professional in residence at Stanford.

One oft-cited example of what not to do is FiveThirtyEight’s story on kidnappings in Nigeria. The data blog published an article, an animated map and other story elements, that demonstrated a dramatic increase in kidnappings in that country, which was a relevant topic at that time because of news of the kidnapping of hundreds of teenage girls.

Credit: FiveThirtyEight

The problem was, the data was based on the number of recorded news stories, not kidnappings themselves.

“You cannot assert that there are more kidnappings just because the media is running more stories about them,” data visualization expert Alberto Cairo wrote for Nieman Lab. “It might be that you’re seeing more stories simply because news publications are increasingly interested in this beat.”

FiveThirtyEight had successfully analyzed the data it had, in the sense that the reporter had calculated and mapped out changes in the numbers. But it hadn’t thought critically about what the limits of the data were.

“(Journalists) need to know how to interview data, how to ask questions of data,” Cheryl Phillips at Stanford University said. “They can do all that with a spreadsheet, honestly.”

The second core skill is simpler: command of basic spreadsheet use.

What that means, according to Derek Willis of ProPublica, is learning to be a user or a creator of spreadsheets — rather than simply a viewer of them. That means not just reading a table of data, but being able manipulate and organize it into new forms. Knowing enough basic math to calculate, say, percent change, is another rudimentary skill.

At Arizona State University, Steve Doig leads an online course that teaches these skills in a few hours. He goes into a few more advanced tactics, but the spreadsheet basics are:

  • Sorting: Rearranging the rows of data in a certain order. This allows you to find, for example, the highest salary in the state or the lowest crime rate in the country.
  • Filtering: Narrowing down the data to only the parts you’re interested in. This allows you to see, for example, only campaign donors in your state rather than the whole country.
  • Basic math: Simple calculations like addition and division enable you to find, for example, how much a budget has increased over the year before.

Helena Bengtsson, who does data stories and staff training at The Guardian, said two of those functions – sorting and simple math – account for the work behind most data stories. “So I can teach anybody to do 80 percent of all data journalism in under half a day,” she said.

These basic functions can be done with Microsoft Excel or its free alternative Google Sheets. “Excel is still the tool I mostly use, and I’ve been doing this 20 years,” Bengtsson said.

If people are interested enough to go beyond Excel, Bengtsson said, they can move into something more specialized.

For instance, investigative reporting would lead them to the analysis side, learning to use tools like SQL and relational databases. If they’re more interested in the presentation side, they could explore visualization tools like Google Fusion Tables.

That’s the point where they would specialize in one of the previously mentioned categories of data journalism – acquisition, analysis and presentation. But at its foundation, data journalism requires only two skills: critical thinking and basic spreadsheet knowledge.

One of them journalists should already have. The other they can learn in an afternoon.

How to hire people with these skills, or train your existing staff

Hiring managers at newsrooms can that require new hires know data skills, ProPublica developer Derek Willis said, and they probably should, if only to send the message that this is something they’re invested in. But reality is not that simple.

“There’s a pipeline problem,” Willis said. Not many journalists learn these skills, at least not formally.

USNews’s Lindsey Cook wrote about the dearth of data teaching in journalism education for Source, a blog for journalism coders.

“It happens every year, just the same,” she wrote. “Papers are posted to a board at NICAR seeking journalists with tech skills; journalists tweet encouragements that any young person wanting a job in journalism should learn data and coding. Look at all these jobs! This is what the young whippersnappers should learn! If only there were more of this!”

Credit: Tony DeBarros

The job posting board at NICAR. Credit: Tony DeBarros

Cook said the journalism industry – and education system – have a lot of catching up to do when it comes to data, just like every other form of technology. The landscape changed so fast that old methods are crashing and burning.

Take the stereotype that journalists are bad at math. Cook said as a journalism student she almost always heard that stereotype tossed out by visiting journalists who came to speak to the students.

Students “have been told by everyone they admire in journalism that you don’t need math, when that’s not the reality of the field,” she said. “And that’s really hurting us when it comes to data journalism.”

Adding to the problem, she said, is that the old model of hiring a college grad and working with them intensely for a year or so has diminished. Instead, institutions hire young journalists expecting them to have skills right off the bat – and, often, lay off older journalists who lack the digital skills newsrooms now seek.

Hiring managers, then, are left with experienced professionals and new journalism grads, both of whom may lack data training.

The next best thing is to train the staff already in place.

Newsrooms around the world offer different approaches to this kind of training, including:

  • Workshops taught by outside contractors
  • Data “boot camps”
  • Workshops taught by the data team
  • Collaborations between data teams and other reporters
  • Assigning reporters to teach themselves online
  • Call on support networks

At The Guardian, Helena Bengtsson has had the most success with a combination of outreach attempts. Her team holds workshops for anyone who’s interested in learning Excel, pitches their own data stories, and does what Pilhofer called “aggressive collaboration:” working directly with individual reporters to make their stories better.

Nonetheless, Bengtsson said, it’s probably the wrong approach to try get absolutely everyone on board with data reporting. Rather, the data “evangelists” should target people in the newsroom who seem most open to learning new skills, and most likely to actually use them.

Flor Coelho, a data editor at La Nacion, a large newspaper in Argentina, suggested that these candidates aren’t necessarily the most tech-savvy reporters, but people who like to innovate. They’re the ones that will actually try something new, and keep trying.

It’s not always the young “techie” reporters who pick it up, Bengtsson agreed. One of her most successful students, she said, was a social science reporter in her 50’s. Bengtsson helped her quantify freedom of information responses related to sexual harassment on campuses, and the reporter took it from there. “She ‘got it,’” Bengtsson said. “How (data) could help her.”

When reaching out to work with individual reporters, Bengtsson said, it’s most important to truly collaborate, meaning use techniques and tools the reporter will understand and be able to use themselves. “Collaboration means trust,” she said.

Data ‘evangelists’ should target people in the newsroom who seem most open to learning new skills, and most likely to actually use them.

That’s why she proposed newsrooms equip everyone – editors included – with the basic spreadsheet knowledge discussed earlier. Since spreadsheets are the necessary foundation for any data work, they act like a “gateway” to other forms of data journalism, like visualization, investigation or writing code for news apps.

Bengtsson said the voluntary training sessions at The Guardian have attracted people from all over the company, not just the editorial staff. “People are very receptive here,” she said. She teaches a few advanced tools like Pivot Tables and formulas, but focuses on the spreadsheet basics.

Lindsey Cook, at USNews, stressed that these workshops should be voluntary. For years, reporters were told by their bosses that they needed to learn to use Twitter. To many of them, it seemed like extra work just for the sake of extra work. “That’s kind of a dangerous loop to get into,” Cook said.

Instead, those doing the training should get a sense of the reporters’ workflow, and make sure the data skills fit into it and make it more efficient. She also recommended that data’s so-called evangelists pitch data skills as less work, rather than more, and work one on one as much as possible.

“It’s important to remember you can’t make anyone do something they don’t want to do,” she said. “It’s hard to make someone sit in a class who doesn’t want to sit in a class.”

An ideal model, she said, might be something similar to coding schools: intensive series of classes that teach specific skills over a period of six or eight weeks. The only thing like that in journalism is data boot camps.

IRE offers computer-assisted reporting boot camps several times a year at its home base in Columbia, Mo. The week-long, intensive training sessions introduce professional journalists to data skills ranging from basic spreadsheet knowledge to visualization. It costs several hundred dollars, but fellowships and other financial aid is available.

Another way to teach data skills is to bring the trainer straight to the newsroom. IRE offers workshops like this, and Keng, the data consultant, does the same as a freelancer.

Keng visits small- and medium-sized newsrooms, where he coaches people on how to use data to report more deeply and efficiently. He said one of his first tasks is to help reporters and editors realize that data is for them.

“There’s a misconception or conception among them that they think data journalism is very hard to do, or expensive,” he said. “One of the challenges is actually to change the perception that only big organizations like The New York Times or Washington Post can do data journalism.”

One of the challenges is actually to change the perception that only big organizations like The New York Times or Washington Post can do data journalism.

The second challenge, he said, is pulling journalists out of their regular routine to spend time learning it.

After the workshops, Keng makes a point of putting the reporters in touch with networks like IRE, Hacks/Hackers or the Global Investigative Journalism Network, who can offer technical or logistical support.

Derek Willis at ProPublica suggested local journalism schools or the outlet’s own alumni network – staffers who have moved on to other newsrooms – as further options for support networks. If nothing else, he said, it’s important that reporters know their problems are not unique.

If all else fails, editors can simply give a reporter time to learn on their own. That takes dedication on the part of the reporter, but many of the best data journalists out there are self-taught. The next section addresses the best ways to tackle teaching yourself data skills.

Learning on your own: how reporters and students can get started in this field, or teach themselves

One way to start out, NICAR training director Jaimi Dowdell suggested, could be to start with a municipal budget. In most cases, a data source like that is simple to understand, and in every case, it’s possible to get. In the U.S., at least, government budgets are always public.

Sarah Cohen, now at The New York Times, wrote the book “Numbers in the Newsroom” to help reporters get over a fear of math that paralyzed them from learning data or even how to read reports like budgets. On another level, though, she said, she hoped it would change the culture a little bit by implying that this is part of a journalist’s job.

Luckily, she said, journalism schools today are so focused on finding people jobs that they aren’t propagating the “journalists are bad at math” stereotype as much as they used to. “It’s less common to joke about it as a charming thing,” she said.

“Numbers in the Newsroom” is an excellent walkthrough of data and simple calculations – “nothing above third-grade math.” It includes an entire chapter on how to analyze a budget. That tutorial, and others, can be found in the appendix to this paper.

Scott Klein, an editor at ProPublica, describes Cohen’s book is an invaluable resource in his classroom at the New School. Another good reason to start with a budget, he said, is that it’s journalistically valid. Klein recommended reporters start with a data set they’re legitimately interested in – and examine it for actual journalism, not just practice.

Journalists should commit to doing a journalism project right from the start, and follow it through.

A lot of online learning courses, like those that teach you how to code, walk the learner through hypothetical situations like, “how to make a peanut butter sandwich” or how to construct a simple game. Instead, Klein said, journalists should commit to doing a journalism project right from the start, and follow it through.

Jue Yang, who teaches at CUNY, said reporters need to tap into “that startup mentality: JFDI, just effing do it.”

“If you really want to be the innovator, rather than just catching up, you got to just start doing things,” she said.

The problem with jumpstarting data journalism efforts, 18F’s Jacob Harris said, is that editors want data stories to be fast, cheap and accurate. “It’s hard to get all three,” he said. That’s why you see so many stories using the same data sets over and over again, like:

  • The census
  • Bureau of Labor Statistics
  • FBI crime statistics
  • Campaign finance filings
  • Local budgets

These data sets can be obtained by any reporter, and any reporter is likely to find a story in them, because they can be localized. You can read about these data sources in more detail in the appendix to this paper.

The National Institute for Computer-Assisted Reporting, a subsidiary organization of IRE, offers many national data sets cleaned, organized and ready to be analyzed. NICAR charges for access to the data, but the fees are scaled to the size of the news organization.

NICAR training director Dowdell said the ideal training situation would be for mid-level managers – those who deal with copy on a day-to-day basis – to be at least a little familiar with data, so they can edit data stories and understand the limits placed on them. If nothing else, she said, reporters can take it upon themselves to learn.

“Sometimes you have to invest a little of your own time,” Dowdell said. “Over time you’re going to get more time, more support.”

Online courses, support networks and data sets with training wheels are all good ways to get started. The next challenge is to fit it into a reporter’s daily life.

Where in the workflow does this go?

A lot of reporters, and their editors, feel like they can’t fit in data reporting when they’re so busy covering news that’s already happening.

“And that’s a very real scenario,” ProPublica developer Derek Willis acknowledged. “What I would say, though, is… done right, this is not an either-or kind of thing.”

While there is an initial time investment to learning data skills, he said, data can actually help a reporter become more efficient, and free up more of her time. If they are really struggling to fit in data reporting, Willis said, they should look at what they’re already doing.

Business reporters, for example, often pull up new business permits. As it is, they’re pulling up data in the form of paper or PDF documents. If they requested that as structured data – that is, something more like a spreadsheet – it would be far easier to analyze and look for patterns. Every reporter, Willis said, has data like that they’re already looking at – they’re just not looking at it in the right way.

“Take a look at the way you’re already collecting information that you consider valuable,” Willis said. “You can make that task easier just by altering the way that you collect and store information in-house.”

Take a look at the way you’re already collecting information that you consider valuable. You can make that task easier just by altering the way that you collect and store information in-house.

Niles, the freelancer, said a few of her successful investigations have come from analyzing data the state had already collected, but simply hadn’t analyzed. “There’s just a ton of data that gets collected,” she said. “Just insights sitting there on the table waiting for somebody to find them.”

She suggested reporters ask themselves, “what is a question that comes up a lot on my beat?” Data could answer or illuminate it. Or, “what is a report I frequently read on my beat?” Data could automate it or make it simpler to look for trends.

If reporters make a habit of these things, she said, requesting and analyzing data should fit naturally into their workflow. “You want it to be as seamless as possible,” she said. “That’s the goal.”

Incorporating new skills into workflow, she said, also provides the key to the next challenge: sustainability.

Need to Know newsletter

The smart way to start your day

Each morning we scour the web for fresh useful insights in our Need to Know newsletter. Sign up below.

The American Press Institute

Our mission

We help transform news organizations for an audience-centered future.

Our programs for publishers focus on four things:

  • 1. Understand your audience
  • 2. Get your audience to pay
  • 3. Transform your culture
  • 4. Do your best journalism
  • Find out more about API »

API solutions for publishers

What we can do for you

API offers a suite of original tools and services for solving the biggest challenges in news:

  • Decide what beats to cover and how
  • Identify and develop the skills you need
  • Assess and improve your culture
  • Drive more reader revenue
  • Drive loyalty through accountability journalism
  • Make analytics work for you
  • Contact us to find out how »