Explore the Science in Evergreen Topics: Lesson 2/7
Navigating Scientific Data

Section 1 of 7

Whether you’re writing about your city’s rental market, looking at temperature trends in your area, or covering flu hospitalizations in your community, you can ground your story in evidence by analyzing and including relevant scientific data.

It takes a little legwork to know where to look for publicly available data—or how to track down less accessible data—and how to interpret them. But dipping your toes into data journalism is worth the payoff of a more informed audience.

Finding Data

Scientific data is available in many forms, some of which may be more easily accessible than others.
Where you go looking for data depends on your story and the questions you’re trying to answer. Sometimes, a simple online search can reveal useful datasets relevant to your story. Other times, you might stumble across an intriguing dataset that serves as the seed for an entirely new story. 
Database What it offers
General
DataCite Commons A search engine journalists can use to find data repositories specific to their field or beat.
DocumentCloud An open-source repository of public documents that have been turned into data files.
Dryad A community-driven platform where scientists release their unanalyzed data.
Freedom of Information Act (FOIA) requests A mechanism to request information from public institutions.
Google’s Dataset Search An engine allowing users to search for data on any topic, with easily navigable filters for dataset formats and usage rights.
Information Is Beautiful A publication dedicated to data visualization which has made all the datasets behind its visualizations freely available.
Registry of Open Access Repositories A collection of open data published by institutions.
Tabula A tool for turning PDFs into data files.
The Accountability Project A library for searching across data that would otherwise be siloed.
Climate/Environment
Climate Central A nonprofit climate research organization that supports local reporters and meteorologists through its Climate Matters program.
Global Biodiversity Information Facility (GBIF) An open-access biodiversity platform with over 1 million species-occurrence records from institutions and citizen-science sources.
International Union for the Conservation of Nature (IUCN) Red List A comprehensive source for endangered species data.
NOAA – National Centers for Environmental Information The central U.S. source for weather and natural-disaster data.
Health/Medicine
Centers for Disease Control and Prevention (CDC) The U.S.’s central source for health information, including data and info on diseases ranging from flu causes to wildfire prevention.
Cochrane Reviews A repository of medical evidence.
Global Health Security Index Measures countries’ preparedness for future epidemics and pandemics.
OpenNeuro A searchable collection of neuroscience datasets.
WHO Global Health Observatory A repository for international data on a wide variety of health indicators.

Vetting Data

Once you find an interesting dataset you’d like to explore, there are a few steps you can take to vet its usefulness and quality:

1/

Consider the logistics of accessing
the dataset

1/

Consider the logistics of accessing the dataset

Can the data be downloaded as a .CSV or .XLS file, for example? Is the dataset open access or released under Creative Commons licenses? Will you need to contact a researcher to access the dataset?

2/

Assess how the information was gathered and
processed

2/


Assess how the information was gathered and processed
Many datasets come with a text file describing the methods behind them. Ask yourself:
  • Is the sample representative of the population the researchers are studying?
  • Was the dataset collected using sound methods (e.g., a survey that was anonymous and sent to a large swath of a population)?
  • Was the information collected recently, or is this data from many years ago? (Older data may be less relevant.)

3/

Consider the dataset’s “cleanliness.”

3/

Consider the dataset’s “cleanliness.”

Is there consistent formatting across the dataset? For example, are all dates or geographic areas coded in the same way? Is there any missing data (blank cells) or outliers (extreme values)? Spotting several inconsistencies in a dataset is a sign that you should steer clear.

4/

Ask experts to vet a dataset before you dive in

4/

Ask experts to vet a dataset before you dive in

Just as you would get an outside source to comment on a research study, ask researchers to help you assess whether a dataset is worth using.

Turning a Spreadsheet into a Story

If a dataset passes your initial quality check, it’s time to roll up your sleeves and do some analysis.
This might sound daunting, but remember: you don’t need a degree in statistics to turn a dataset into a compelling story. There are concrete steps you can take and tools to assist you:

Copy the dataset into your own spreadsheet that you can customize according to your needs. Then, consider whether you need to “clean” the dataset. Do you need to standardize column names or delete irrelevant sections?

As you prep your dataset, be careful not to change the data points themselves!

Screenshot
Think of the data as a source. What questions can you “ask” of the data? What might the data be able to “tell” you? If you’re struggling, think about how numbers connect to real-world concepts and people’s experiences.
Screenshot

Try out multiple questions with your analysis. Ask yourself: What role might each variable play in a larger story?

Could something be an indicator of a larger trend? What could you learn by comparing two variables? Like you do in the reporting and writing process, let yourself explore different avenues for telling a story.

Screenshot

Take it slow! It can be easy to be misled by data as you go down a rabbit hole.

Think about what the data aren’t showing—what are their limitations? Are there discrepancies between what’s there and the story you’d like to tell?

Screenshot

The same way you’d record interviews and hang onto your notes, keep track of the different things you try as you’re analyzing the dataset.

This will help you retrace your steps if you get stuck, and it will make it easier for a colleague or expert to review your work—which, like fact-checking, is good journalistic practice.

Data Analysis Tools

There are several tools you can use to organize and analyze data, many of which are free.

Database What it offers
Microsoft Excel The classic software for organizing data; formulas, filters, and pivot tables can be used to clean and pare down large datasets. (There are numerous forums available to help you get started.)
Google Sheets Like Excel, but free and hosted online, Google Sheets can be especially useful for collaborative projects.
RStudio Free and open-source, a popular program for running R (a programming language used to conduct exploratory analysis and make graphics).
Jupyter Notebook An open-source program, run through your web browser, that can support R, Python, and other programming languages. You can scrape data from the web, analyze it, and create visualizations. 
QGIS An open-source platform for analysis that involves Geographic Information Systems, or GIS. You can geocode data points, analyze them based on location, and build charts to present that analysis. 
MySQL An open-source platform which runs SQL, a programming language for coders who deal with large databases.
OpenRefine An open-source program based on JavaScript that lets you—without doing any coding yourself—clean messy datasets, analyze them, and reconcile data with web services such as taxonomic databases and Wikidata.

Sharing Your Findings

Once you’ve analyzed your data and identified some threads for an interesting story, you’ll want to think about how to share those insights with your audience.

1 /

Do your findings illustrate a point in a larger story?

1 /



Do your findings illustrate a point in a larger story?
You may want to use a narrative format to weave in your evidence.

2 /

Do your findings hint at a trend that needs some ground-truthing?

2 /



Do your findings hint at a trend that needs some ground-truthing?
It’s time for more reporting. Call a bunch of experts, vet your conclusions, and build a comprehensive story around your results and others’ takes.

3 /

Are your findings the basis for an entire story?

3 /



Are your findings the basis for an entire story?

Data visualization can be a helpful storytelling tool.

Dipping into Data Visualization

You can display data in all kinds of visual formats, including infographics, charts, and maps. As you decide what works for your data, think about the format that communicates the information most clearly. Focus on delivering a key takeaway, as you would for a written story. See below for a quick introduction to data visualization: ​

Database What it offers
Datawrapper An online platform (with free and premium versions) that can generate charts, tables, and graphs without coding.
Tableau Public A free publication service for data visualizations that also features a blog, sample data, how-to videos, and other resources.
FlowingData A blog that features exceptional data visualization projects as well as tutorials and courses for members.
The R Graph Gallery A collection of visualizations and instructive tools focusing on the R packages tidyverse and ggplot2. 
Data-Driven Documents (or D3) A JavaScript library that allows users to build interactive visualizations in a web browser. 
Geojournalism A collection of tutorials and examples geared towards helping environmental journalists use geographic data.
Flourish A tool with free and paid options that can turn data into interactive, updatable visualizations with no coding required.

This is the heading

Lorem ipsum dolor sit amet consectetur adipiscing elit dolor

This is the heading

Lorem ipsum dolor sit amet consectetur adipiscing elit dolor
Click Here
Skip to content