Deciphering Statistical Terms - The Open Notebook

← Navigator Home Page

Analyze Policy Impacts: Lesson 7/9

Deciphering Statistical Terms

Section 1 of 6

1 Introduction
2 Common Stats Terms
3 Understanding Risk
4 Test Your Stats Knowledge
5 Misleading Statistics
6 Getting Expert Help

Scientists often use statistical tests to measure effects in their studies. For example, did an Alzheimer’s drug improve participants’ recall ability? Or, did a reading intervention boost students’ test scores?

Don’t panic when you come across a series of statistical tests as you read scientific papers. Even if math wasn’t your favorite subject in school, understanding a few key concepts can go a long way towards interpreting the results of a study—and generating questions to ask your sources.

Common Stats Terms

There are several statistical terms that come up frequently in scientific studies. Flip the cards
below to help you better understand what you’re looking at.

1 /

Sample size (n)
The number of people or other subjects (e.g., animals, cells, cities) being examined in a study

2 /

Outlier
A number that’s extremely high or low, compared with the other values in a dataset
Three Blue Stars

Researchers might exclude outliers from their analyses, potentially skewing their results.

3 /

Mean (x̄)
The average of a set of values
Three Blue Stars

For example, if the number of students in each of 6 classrooms is 12, 18, 22, 24, 32, and 33, the mean is 23.5 (the total divided by 6).

4 /

Median (M)
The midpoint of a range of values
Three Blue Stars

For example, if the number of students in each of 6 classrooms is 12, 18, 22, 24, 32, and 33, the median is 23 (half the classrooms have fewer than 23 students and half have more).

5 /

Error

5 /

Error
The degree of uncertainty around the researchers’ estimate, typically reported as a margin of error
Three Blue Stars

For example, a poll might show one political candidate ahead of another 53 percent to 48 percent, ± 3 percent.

6 /

Confidence interval (CI)
A range of values that likely includes the true value researchers are estimating in their study, determined by adding and subtracting the margin of error from the estimate
Three Blue Stars

For example, a CI for the reading scores 100 students received after completing a tutoring program of 84% to 92% means that the true average score among all students likely falls in that range.

7 /

Correlation (r)

A relationship between two variables

8 /

Percentage point change
The difference between two percents when you subtract them.
Three Blue Stars

For example, if a flu outbreak infects 3 percent of the population one year and 5 percent the next, that’s a 2-point increase.

9 /

Percent change
The relative difference between two percents. Percent change = (percentage point difference / starting value) x 100
Three Blue Stars

Using the flu example to the left, the percent increase in flu infections from one year to the next is 66.7 percent. (2/3) x 100 = 66.7

10 /

P-value (p)
A number between 0 and 1 indicating the chance of getting the results a study did if there is no actual effect in the real world
Three Blue Stars

In many fields, p <.05 indicates that a finding is statistically significant, meaning the likelihood of the study’s finding having occurred just by chance alone is less than 5 percent.

11 /

Effect size (Cohen’s d or r²)

11 /

Effect size (Cohen’s d or r²)
A measure of the magnitude or strength of a finding, such as how different two experimental groups are
Three Blue Stars

Small effect sizes indicate weak or potentially less meaningful findings. A finding can be statistically significant but have a small effect size.

Understanding Risk

Risk is often a central component of science stories. For example, what’s the risk of contracting bird flu after exposure? How likely are wildfires in your area during the summer months? What’s the probability that a public safety policy will mitigate the risk of gun violence in a community?

Covering risk with accuracy and nuance can be tricky—especially when studies report different kinds of risk or include odds instead of risk. Familiarizing yourself with the concepts below will help sharpen your reporting on risk.

Relative Risk	Absolute Risk
A comparison of the risk (or likelihood) of an event between two groups	A group’s baseline risk (or likelihood) of an event happening
Example: A study reports that the risk of developing brain cancer is 70 percent higher in people who live less than a mile from a cell phone tower than in people who don’t. The reported statistic here is relative risk. If you take a closer look at the data, you might see that the absolute risk of developing brain cancer in the first group was 1.7 percent and the risk in the second was 1 percent. That’s not a sizable difference after all.
Takeaway: Be careful not to report on relative risk without first taking a look at absolute risk. You can typically find this baseline risk in the results section. Knowing the full context will help you explain the importance of a finding to your readers.

Risk	Odds
How likely an event is to occur, divided by all possible outcomes—often expressed as a “percent chance”	How likely an event is, divided by the likelihood that it won’t occur ❗️Many scientific papers report their findings as odds
Example: A climate report denotes the odds of a hurricane hitting the Texas coast in a given year as .33 (or 1:3). That doesn’t mean there’s a 33 percent chance. To get that statistic, convert odds to risk with this simple formula: odds/(1+odds) = risk .33 / (1 + .33) = .25 This means there’s a 25 percent chance of a hurricane hitting the Texas coast.
Takeaway: Be careful not to accidentally report odds as risk. Translating odds into risk makes a finding more intuitive to your audience, who might tend to think of likelihood as risk, or percent chance.

Test Your Stats Knowledge

Practice identifying key statistical concepts in reporting scenarios. By understanding these terms in context, you’ll strengthen your ability to interpret them for your audience.

Misleading Statistics

With a few extra statistical tests or clicks in a data-analysis program, researchers can—intentionally or not–skew their results. Misleading statistics can crop up easily when researchers track many different variables as part of any one study—a common practice in science.

For example, a psychologist interested in the relationship between implicit gender bias and employment discrimination might track hiring managers’ scores on perceptions of candidates’ competence, time spent reading cover letters, whether they invite an applicant for an interview, and many other behaviors.

When analyzing their results, however, researchers have to make thoughtful decisions about which combinations of variables to test and how to test them appropriately. Look for the questionable practices below as you comb through a study’s results section. Spotting a red flag doesn’t automatically mean a researcher has done something unethical or even incorrect, but it should prompt a question in the interview.

P-hacking is the practice of “mining” data for significant results.

This can involve researchers running a large number of statistical tests until they obtain the statistically significant result they prefer. The more statistical tests they run, the greater the chance a statistically significant finding will pop out just by chance. Many fields follow the threshold of p < .05 as significant, meaning there’s a less than 5 percent chance that their finding occurred randomly. So, running 20 or more statistical tests means it’s pretty likely at least one will come back significant.

While trying out several different combinations of variables—what’s known as doing multiple comparisons —isn’t necessarily a dubious practice, a large number of tests could be a sign of p-hacking. If you spot a large number of tests run on the same data, ask the researchers about their methods to correct for this.

HARKing is an acronym for “hypothesizing after the results are known.” This practice involves researchers running multiple statistical tests, choosing the ones that end up significant, and then retroactively generating a hypothesis that aligns with that result. carries the same statistical shortcomings of p-hacking. Cherry-picking only significant results (or ignoring predicted results that didn’t pan out) creates a misleading impression, suggesting that a finding is more robust and replicable than it really is. In some cases, researchers upload their research plans to pre-registration sites such as ClinicalTrials.gov or the Open Science Framework, where they indicate the endpoints (or outcome measures) they intend to examine before their study starts. If a pre-registered study’s published results don’t match these initial plans, ask the researchers why. Their answer will clue you into whether there was a legitimate reason to switch endpoints (perhaps they ended up using a more valid test) or whether was at play.

Interim analysis

Researchers might run some preliminary tests on their data before they’re done with data collection. But “peeking” at data in this way, known as interim analysis, can artificially increase the chance of getting a significant finding. This is because it invites p-hacking, and it could prompt a researcher to swap out a study’s endpoints or alter an intervention in progress.

If the methods of a paper suggest that researchers analyzed some data early, ask them to explain why and how they controlled for multiple comparisons.

Sometimes, researchers measure outcomes that are far removed from truly meaningful real-world consequences. Behavioral researchers might measure teens’ self-reported intention to use substances rather than measuring their actual substance use. And environmental researchers might measure concentrations of airborne particulates rather than measuring population health outcomes such as rates of asthma or premature death.

Such surrogate endpoints aren’t inherently flawed, but journalists should always ask researchers what a study’s favorable results actually mean for people’s daily lives.

Subgrouping data

Sometimes researchers separate out a portion of their data for analysis, examining their findings in just women or in certain age groups, for example. Doing so may make sense, if the researchers have a reason in advance for expecting different results for different subgroups. But if not, subgrouping can be problematic. It’s sometimes a sign of p-hacking —a search for something—anything—that’s statistically significant. And in some studies, the total sample may not be large enough to divide into subgroups and still generate reliable results. Ideally, researchers would follow-up on promising subgroup results by running a whole new study with only people who would fall into that subgroup. If you spot signs of subgrouping in a study, ask the researchers to explain their rationale for doing so.

Getting Expert Help

As you work through the statistics reported in a study, remember you don’t have to go it alone. Rely on your sources’ expertise by asking them to vet your understanding and explain tricky concepts. There are also several free resources designed to help journalists sharpen their math skills and interpret weedy stats sections.

Resource	What it offers
STATSCheck	A resource through which journalists can submit questions to statisticians
American Statistical Association	A professional association that can help journalists connect with stats experts
Math for Journalists	A resource from the Society of Professional Journalists with links to journalism-specific math guides and online calculators for different fields
Math for Journalists Certificate	A self-directed short course offered by Poynter on writing about numbers and doing basic calculations when dealing with data
Statistical Terms Used in Research Studies: A Primer for Media	A tip sheet from The Journalist’s Resource explaining stats terms commonly used in research
Simple Learning Pro	A collection of short videos on statistical concepts for absolute beginners
OpenLearn	A library of free courses ranging from 3–50 hours long in multiple math and statistics topics, including math for science and technology, medical statistics, and interpreting charts

This is the heading

Lorem ipsum dolor sit amet consectetur adipiscing elit dolor

This is the heading

Lorem ipsum dolor sit amet consectetur adipiscing elit dolor

Click Here

← Navigator Home Page

Analyze Policy Impacts: Lesson 7/9 Deciphering Statistical Terms

Common Stats Terms

1 /

1 /

2 /

2 /

3 /

3 /

4 /

4 /

5 /

5 /

6 /

6 /

7 /

7 /

8 /

8 /

9 /

9 /

10 /

10 /

11 /

11 /

Understanding Risk

Test Your Stats Knowledge

Misleading Statistics

Getting Expert Help

This is the heading

This is the heading

Analyze Policy Impacts: Lesson 7/9

Deciphering Statistical Terms