*[Editors’ note: *The Open Notebook *is delighted to present an exclusive excerpt from *The Science Writers’ Handbook: Everything You Need to Know to Pitch, Publish, and Prosper in the Digital Age*, a valuable new guide written by members of the Scilance writing community. In this chapter, freelance science writer Stephen Ornes provides a primer for thinking about numbers.**]*

**By the Numbers: Essential Statistics for Science Writers**

By Stephen Ornes

The practice of science almost always requires measurement, and measurement often means fitting precise tools to an imprecise, messy, and complex world. As a result, scientific research—and science writing—can involve an ongoing wrestling match with uncertainty: every measurement introduces the opportunity for statistical error, human error, and a misunderstanding of the data (often by the science writer). In this chapter I’ll offer some general guidelines for how to think about scientific uncertainty, and then some tips on how to assess how serious the first two issues are in a given study, and to avoid being the cause of the third.

**The Uncertainties of Uncertainty**

During interviews, scientists often implore science writers to take note of caveats. But for a writer, that can mean using part of your precious word count to dwell on nuances that are meaningful (and comprehensible) only to experts in the field. Too many caveats can result in a highly accurate piece that no one reads.

So how much uncertainty do you include in your article? There’s no easy answer, but here are some variables to consider.

** Story length. **If you’re writing a 300-word story on new research, you have enough room to hit only the high points. You can nod to the inherent uncertainty of the results with a word or two—by saying, for instance, that a correlation appears “very likely.” If you’re writing a 3,000-word feature, you may find that describing the reasons for the uncertainty—in accessible, clear terms—adds a level of depth to your reportage.

** Your audience. **News articles written for the general public don’t need to include every caveat and condition posited by the researcher. And the general public doesn’t need a behind-the-scenes peek at every new paper. However, that doesn’t mean science journalists have to be cheerleaders of the research: always consult at least one outside source, especially someone who can address the limitations of the new findings.

** The implications of the uncertainty.** The level of acceptable uncertainty varies wildly among fields. A clinical trial might gain notice for results that, to a particle physicist, have a laughably high level of uncertainty. Could a large degree of uncertainty undermine the study’s findings? Or—as is often the case in astronomy—does uncertainty suggest an interesting new direction for research? Ask the researchers.

**Seeing the Story in the Stats**

Can’t tell a confidence interval from a p-value? Don’t know the difference between absolute risk and relative risk? It’s time to do a little homework. Science writers often have to wade through papers packed with statistics, and the jargon is easy to misinterpret. Here are some tips to help make your story as accurate as possible, and a short glossary of common stats terms. (I’ll often use examples from biomedical research, but the concepts apply to everything from astronomy to zoology.)

** Percent vs. percentage points.** Let’s start off with an easy one. A percent, by definition, is the amount in each hundred something occurs. For example: 12 out of 50 U.S. states start with a vowel, which means

24 percent of U.S. states start with a vowel—and therefore 76 percent start with a consonant. Percentage points are totally different. Percentage points are the difference between two percentages. The difference between the percent of states that start with a consonant and those that start with a vowel is 76 − 24 = 52 percentage points. The difference between a 6 percent mortgage and a 4 percent mortgage is 2 percentage points, even though 6 is 50 percent more than 4.

* Know that correlation does not imply causation. *Large observational studies have reported an association between higher consumption of alcoholic beverages and increased risk of breast cancer. However, that doesn’t mean we can use those studies to report—as many outlets do—that “drinking alcohol increases your risk of cancer.” Because observational, or epidemiological, studies compare what has already happened to one group to what has happened to another in the general population, they can identify only correlations, not causes. So if the scientists use an observational study and report an “increased risk,” that doesn’t mean they found the cause of the increase—the drinkers could all be doing something else that contributes to cancer, for example. (Causality is remarkably difficult to establish, but there are other types of studies in which medical researchers have more control over the variables and can therefore come closer to identifying a cause.) It bears repeating: if the scientists use an observational study to report an “increased risk,” that doesn’t mean they found the cause of the increase.

* Ask a statistician. *If you’re not sure whether you’re accurately reporting the findings from a particular study, look at the author list. For medical studies, find the biostatistician. Call or e-mail. Ask. If you’re unsure whether the statistical measurements justify the conclusions of the paper, find a disinterested statistician who did not work on the study. Ask.

** Pay attention to the tools being used. **Did the study report relative risk, odds ratios, or hazard ratios? Or something else? Make sure you report which populations are being compared. Do researchers claim a “reduced risk” in press releases when the study reports only odds ratios? Find out why. (And see the glossary on p. 56 for definitions of those terms.)

** Look at the confidence interval.** Peer-reviewed studies that present a conclusion based on statistics almost always include the confidence interval, which is the numerical range that likely (usually with 95 percent probability) includes the true value. Large confidence intervals indicate high uncertainty, and may mean that the finding isn’t as strong as the headline you have in mind would imply.

* For health studies, compare the increase or decrease in risk to the risk itself. *A study that connects some genetic quirk to a 50 percent increased risk for some disease seems a lot less important if the likelihood of developing that disease is, say, 0.5 percent—in which case that genetic quirk is associated with an overall risk of 0.75 percent. See absolute risk below.

** Find sources you trust.** If you’re working on a story and come across a stats expert who can explain things really well, keep that person’s contact info handy. Next time you’re in a bind, that person may be able to help you out. (And remember, the next time you meet that source at a conference, the beer’s on you.)

** Use appropriate language to describe the evidence. **If you’re reporting on a study that tested a human medical treatment on mice, be sure to point out that the subjects were mice, not humans. Specify how many mice, and what the next level of testing will measure.

**A Science Writer’s Statistical Phrasebook**

**Statistical Significance**

*What it means*: Statistical significance gives researchers a way to distinguish between events that happen at random and those that may happen for a reason. Results are usually said to be “statistically significant” if there is a less than 5 percent chance that the measured outcome would have occurred at random.

*What to watch for*: Statisticians have pointed out that the 5 percent cutoff is arbitrary, and some researchers go so far as to say that studies that rely on statistical significance may not themselves be reliable. Small sample sizes and large confidence intervals may indicate that the findings have weak support from the evidence. Watch for follow-up studies that verify or discredit the original.

*P-value*

*What it means*: The p-value tells you the likelihood that the observed test result happened by chance. A low p-value means the results were significant and unlikely to have occurred by chance. “Statistical significance” usually requires a p-value of less than 0.05, which means that there is at most a 5 percent chance that the outcome occurred at random.

*What to watch for:* p-values larger than 0.05 suggest that the correlation is weak.

*Confidence Interval*

*What it means*: The range of values that likely includes the reported value of the measurement, within the probability determined by the p-value.

*What to watch for*: Does it seem like a large range of possible values? Ask the researchers why it seems so big. Do the possible values of the measurement include zero? That may be a red flag.

**Odds Ratio**

*What it means*: This is a common tool used in studies that compare people with a particular condition—such as a particular disease, or on a particular drug—to people who do not have that condition. Odds ratios compare the likelihood of an event’s occurrence—such as death—in two groups in a study. Specifically, it compares the odds of the event in one group to the odds of the event in the other.

*What to watch for*: Be wary of reporting odds ratios as risk. If a study reports an odds ratio of 1.35, that doesn’t automatically mean they found an increased risk of 35 percent. Talk to the study authors or a statistician to get a good handle on what the number means—and how to report it.

*Relative Risk*

*What it means:* Another common tool used to compare risk, or probability, in two different groups.

*What to watch for*: Be careful not to report relative risk as absolute risk. That can lead to overstating the importance of a result (see below). For example, studies have found an association between aspirin and significantly reduced relative risk of cancer, but that corresponded to only a small drop in an average person’s risk.

*Absolute Risk*

*What it means*: In disease studies, this is the average lifetime risk of a person developing the disease.

*What to watch for*: In studies that compare risk between different groups, be sure to know how they report their results. Say the absolute risk of developing Disease X is 10 percent, and researchers find a 50 percent increase in relative risk associated with a rare genetic mutation compared to people without the mutation. Then the absolute risk of developing Disease X, for people with that mutation, is 10 × 1.5, or 15 percent.

**SciLance Says …**

- Use simple but scientifically appropriate language in your reporting. Make sure what you write accurately conveys the evidence from the scientific study.
- Get to know your stats. Become friends with confidence intervals, p-values, relative risk, and their friends. When in doubt, don’t fudge—ask a statistician.
- Determine the sources of uncertainty in the fields you cover. Do they arise from the tools themselves? What do the experts worry about? What do they criticize each other for?
- Understand the implications of the study you’re covering. What does it mean if it’s accurate? How will it affect people? What about if it’s false?
- Find sources you trust to give you perspective on new research. Pamper them.
- Think critically about the research you’re covering. Seek outside opinions on the findings to put them in perspective.

Stephen Ornes writes about math, physics, space, and cancer research from an office shed in his backyard in Nashville, Tennessee. He’s written about tilting exoplanets for *Discover*, the mathematics of pizza slicing for *New Scientist*, and tumor banking for *CR*. His first book was a young adult biography of mathematician Sophie Germain, and he teaches a science communication class at Vanderbilt University. Follow Stephen on Twitter @stephenornes.

*Excerpted from the book *The Science Writers’ Handbook*, edited by Thomas Hayden and Michelle Nijhuis. Reprinted by arrangement with Da Capo Lifelong, a member of the Perseus Books Group. Copyright © 2013.*

Pingback: Two new books | Stats Chat

Great point, Richard. And it speaks to something I asked myself many times while writing the chapter: How can we convey functional but accurate information without sacrificing accuracy? And what do we leave out? As De Gruttola points out emphatically in the SN interview with Jon Cohen, the p-value doesn’t tell you anything about the truth of the hypothesis being tested. It doesn’t reveal whether or not the effect is real.

Statistician Andrew Gelman at Columbia has a good paper on this very subject, too. http://www.stat.columbia.edu/~gelman/research/published/pvalues3.pdf

‘The p-value tells you the likelihood that the observed test result happened by chance. … a p-value of less than 0.05 … means that there is at most a 5 percent chance that the outcome occurred at random.’

Statisticians will grit their teeth at this – it’s strictly speaking not the correct way to describe the meaning of a p-value. (Which is: it’s the chance of seeing at least this extreme a test result if outcomes were produced at random). I sometimes wonder if the simplified, intuitive version that Stephen chose here is good enough for science writing. What do people think?

Jon Cohen has written well about this in the context of AIDS vaccines http://news.sciencemag.org/sciencenow/2009/10/30-01.html?etoc . Also see http://understandinguncertainty.org/why-it%E2%80%99s-important-be-pedantic-about-sigmas-and-commas.

I think there are enough differences between the definition of the p-value used here and what the p-value is (“the likelihood that an observed test result happened if the null hypothesis is true”) for it to be important to use the precise definition.

Calling it chance is certainly more intuitive, but I think there is too much wiggle room to take that in the wrong direction. Also, the null hypothesis is not always that a result is happening by chance, which takes away from the usefulness of linking the p-value with chance.

I can understand the desire to give an intuitive description of significance testing, but I think the definition given is too incorrect to communicate useful information. This bit in particular is problematic:

““Statistical significance” usually requires a p-value of less than 0.05, which means that there is at most a 5 percent chance that the outcome occurred at random.”

This can be read as implying that the p value tells you the chance that the null hypothesis is true. But that interpretation is false (and importantly false).

Of course, it’s easy to criticise without offering solutions… I’d trepidatiously suggest that a better layperson’s-definition of a p value could be:

“The p value tells the researchers the chance of finding the data that they did, if no relationship ACTUALLY exists between the variables”

That definition intentionally leaves out two technical points (that a p value actually refers to the probability of a test statistic as or more extreme than that observed; and that null hypotheses are not *always* of no relationship, and occasionally may not refer to relationships at all). But it gets across the really important conceptual point that p values tell you the probability of seeing the result that we have, given that the null is true – but *not* the probability that the null is true, given the data observed.