Data Journalism: A Primer for Science Journalists

A group of Lego people stand in a group.
Drew Maughan/Flickr (CC BY-NC-SA 2.0)


Did you ever know someone who was devastatingly handsome, made great conversation and whom everybody talked about even after he left the room?

That’s how I think about good computer-assisted reporting (CAR), also often referred to as data journalism. These projects turn spreadsheets into insightful infographics, support stories with concrete context, and keep people talking. Think of The New York Times’ recent interactive piece, “Can you Live on the Minimum Wage?” or its classic “Toxic Waters” series.

As great as that sounds, CAR can seem intimidating to the uninitiated. But your journalism will be better for using it, and somebody needs to liberate those data to tell the next big story. Once you get to know some methods, it’s likely you’ll get hooked on bringing added depth to your stories. For scientists-turned-journalists, CAR might even be more familiar than you think.“I kind of see it as circling back to my roots,” says Peter Aldhous, a freelance journalist and lecturer in the Science Communication Program at the University of California, Santa Cruz, who has become a science CAR apostle in recent years. “Before I was a journalist, I was a scientist doing a Ph.D. in animal behavior,” he says. “I enjoyed doing experiments, analyzing data, and writing it up.” His recent CAR projects for New Scientist include charting exoplanets in “How Many Earths?” (details on the project’s data here) and visualizing localized temperature changes in “Your Warming World.

For Aldhous, using CAR means adding new knowledge to the conversation: “Can I go into analyses to find a story, to contextualize a story, to turn it from ‘he-said-she-said’ into something a little more substantial? I’m not limited to interviewing a scientist who had done this research. I can talk to them, get their advice, and then actually work with their data to tell the story.”

CAR wasn’t always so accessible. Our tools have improved, and within the past two decades, data have gone from stacks of files tucked in dark shelves to searchable documents ready to download in seconds.


How Do You Get Started?

“Computer assisted reporting involves a small set of skills,” says David Herzog, academic advisor of the National Institute of Computer-Assisted Reporting (NICAR), part of Investigative Reporters and Editors (IRE). Journalists using CAR learn to identify, obtain, evaluate, clean, analyze, and visualize data.

Data journalism doesn’t start with data. Rather, it starts with a question. For instance, Oklahoma has recently experienced earthquakes; how well are Midwest buildings prepared for intraplate tectonic activity? Or maybe you wonder how wildlife populations have fared in recent droughts. Find a spreadsheet or database to get to your answer and concentrate your skills on “interviewing” those data.

Both Aldhous and Herzog recommend starting with reasonably small projects that you can work with in basic spreadsheets. How many exotic carnivores live in your state? What metal or mining sites have paid big fines in the past year? You can generate spreadsheets on those, and many other, questions right now online. Download what you can, and see what you can answer just using Excel or a similar program. “Get used to the idea of handling data, continually checking and rechecking, and build from there,” Herzog says.

There’s a learning curve, no doubt, but that shouldn’t keep you from trying. “Nothing in life that’s worth doing is going to fall into your lap and be easy,” Aldhous says. “But you don’t have to jump into it a whiz who can write elaborate code in [Structured Query Language]. Nobody does that. Start with manageable things,” he says, referrring to SQL, in which many databases are coded.


Obtain Data

Maybe your scientist-sources hand over their spreadsheets, and if so, you’re golden. Otherwise, unless you create your own data using electronic sensors, you’ll need to learn where government datasets live. Some places to start are, the National Freedom of Information Coalition, or NICAR’s database library.

Most government agencies offer specific data web pages. Here’s a sample for science writers:

If you’re just fishing for stories, you can use advanced search features for .XLS documents and other formats, and narrow searches to .gov or .edu sites to see what’s out there.

To access some government databases, you may need to submit a FOIA request, but you should also try to talk to the database manager first. Sometimes, they can help you get what you need right away, or at least let you know what to ask for.

For local and regional reporting, it might help to hunt for documents in person. Herzog says that students in his class at the University of Missouri School of Journalism have better success obtaining datasets from nearby regional or state offices than by submitting FOIA requests to top-level federal agencies.

This is a good point to begin documenting your journey. Keep a log of the data you download, who you talk to, and when. You’ll also want to keep track of all of your queries as you go so you can replicate results later.


Evaluate Data to See What You Have

It starts with spreadsheets, using programs such as Numbers, Excel, Microsoft Access, NaviCat, or the open-source pgAdmin and LibreOffice Calc. See what columns your data fall into, where gaps exist, and if you have all the data you really need.

Whatever rabbit hole you follow, you’ll have to learn the language of the database management system. This takes time, but the effort is worth it. “If you enjoy this type of work, and stories get published, and people pay attention to them, then that’s its own reward and you push yourself a little bit further,” Aldhous says.


Drew Maughan/Flickr (CC BY-NC-SA 2.0)


Clean Your Data to Ensure Accuracy

Data journalists have to get used to working with messy and often incomplete data, says Aldhous. “If you’re dealing with any data that has to be collected because the government says it does, you can expect that data’s going to have some problems.”

Look for misspellings or inconsistent notations. This is hard to do by hand, especially with large datasets. You’ll want to use something like Open Refine (formerly Google Refine), a free open-source software. At its most basic, Open Refine can cluster similar spellings or abbreviations to identify data columns that should be merged.

Science journalists might have it easier. “In science, you have people lovingly curating datasets,” Aldhous says. He worked with NASA’s entire climate surface temperature records across the globe for “Your Warming World.”


Watch It All Come Together: Analyze and Visualize Your Data

Dozens of tools to animate data exist online. For example, creating visualizations or merging multiple data sets in Google Fusion Tables can help discover trends or disparities in a more dynamic way than simple spreadsheets can. Combine different tables and databases, if necessary. Storymaps by esri illustrates stories geographically, such as this feature on population density or for “Visualizing Large Earthquakes in Southern California.” For more mapping, get into advanced ArcGIS, or use free and open-source software such as Quantum GISTileMill, and the Leaflet Javascript library for mapping. If you can dream of a way to tell your story, it probably exists, and it’s likely free or cheap.


What If You Get Stuck?

We list some books below that you might want to keep on hand. Also, tap into online resources, including software documentation, message boards and listservs. Make friends with IT people, or plug into a chapter of Hacks/Hackers, in which journalists and technologists meet to find new ways to tell stories.


Why Bother?

The Poynter Institute puts it well: “CAR enables us to publish stories that our readers want and can’t get anywhere else … [and] CAR will help us create, or improve, the watchdog culture at our newspapers.” CAR projects could span epidemiology, climate change or biodiversity, natural hazards or chemical risks. Some of the most impressive environmental examples include U.S. News & World Report‘s “Ghost Factories” package and the “Toxic Waters” series by The New York Times.

But CAR doesn’t always have to be elaborate, or have much analysis at all. In December, Scientific American featured Jennifer Frazer‘s story on a fungal disease creeping down the west coast. The article’s designer got outbreak numbers from British Columbia by phoning them in; Canada doesn’t require its data to be digitized, so the data were simply in a record-keeper’s physical files. In the 11-and-3/4ths hour before publication, Frazer says, editors found a discrepancy in numbers used for the graphic and the article, and no one at the magazine was sure how to resolve the discrepancy. “The only person who could tell me was that woman in British Columbia with the big dusty books.”

In every case, seeking expert help is the best way to guard against errors. Before running the exoplanet piece, Aldhous’s team showed Natalie Batalha, Jon Jenkins and other members of the Kepler mission team their product before publishing.


Take the Data Dive

If you are ready to get started, essentials from the IRE bookstore include:

  • The Investigative Reporter’s Handbook: A guide to Documents, Databases and Techniques, Fifth Edition. More than 500 pages cover investigative reporting basics, with extensive advice on investigating government, business and issues. Journalists’ personal narratives on their investigative reporting stories bring the subject to life.
  • Numbers in the Newsroom: Using Math and Statistics in the News. A slim, spiral-bound quick-reference guide sure to refresh your college math skills and repackage them for the news.

You might also try MySQL Crash Course. A conversational, yet thorough guide to structured query language basics, available on Amazon and the finest bookstores everywhere.

Also, this week (February 27–March 2), IRE is hosting a CAR Conference in Baltimore. If you want to explore or commit to CAR, consider joining IRE, which offers thousands of tip sheets and cleaned-up databases, and other tools and resources for investigative reporting.

“That’s where you want to hang out,” says Aldhous, who gets most of his interaction through the NICAR discussion list. “They are just a fantastic group, utterly committed to use public data, open data, and they really opened my eyes to the possibilities in journalism.”



Tina Casagrand
Tina Casagrand

Tina Casagrand is a TON fellow sponsored by the Burroughs Wellcome Fund. Tina is freelance journalist and recent graduate of the University of Missouri School of Journalism, where she studied magazine writing and publishing, anthropology, biology, and art. Her dream is to start an independent publication covering environmental health and social justice in the lower Midwest and Ozarks. Follow Tina on Twitter @Gasconader and at her blog.

Skip to content