Momentous changes in science often appear only gradually, over many decades. But recent years have seen an abrupt shift, in many fields, in the way science is done. With rapid expansions in computing power came the era of “big data”: science involving massive datasets that could no longer be analyzed using traditional methods. At the same time, the increased computing power also gave rise to now-commonplace analysis methods such as artificial intelligence (AI) and computational modeling. These advances have led to groundbreaking insights in many disciplines, including genomics, particle physics, astronomy, neuroscience, and environmental science.
As powerful as they are, these analysis techniques are highly susceptible to misunderstanding and overblown claims. For that reason, science journalists have a particular responsibility to avoid falling prey to hype when communicating about studies that use AI and computational modeling. This means navigating the considerable complexity etched into many of the models by design, as well as resisting the “glitz factor” and the assumption that AI results are superior to or more objective than non-AI results. In reality, research that uses AI can suffer from an alarming lack of rigor. In the worst case, the flaws can lead to racist and sexist algorithms. That means science journalists need to act as filters to identify the studies most worthy of coverage.
These newer analysis methods rely on complex modeling techniques to make sense of the world. With a bit of background knowledge, a sense of when it’s worth a deep dive into the details, and a critical eye for red flags, science journalists can accurately bring them to life for readers.
Master the Machine: Learn the Basics
The first step in reporting on artificial intelligence is learning the key terms. That’s not as straightforward as it sounds. As the field has developed and expanded, even longtime AI researchers don’t have a simple answer to the question “What is AI?” says Karen Hao, the senior artificial intelligence reporter at MIT Technology Review. In 2018, she drew a helpful flowchart to determine whether something is AI or not.
Today, Hao defines AI as software that can autonomously make decisions. That can refer to a chess-playing computer, a self-driving car, or an algorithm that suggests new shows to Netflix viewers. In research, AI is increasingly used in predictive-modeling scenarios where the algorithm learns from data to predict a future outcome. Much of this work relies on machine learning, a branch of AI that, as Hao puts it, trains “algorithm[s] to understand often massive amounts of data, find the meaningful patterns in that data, and then apply those patterns to future data.”
The acceleration of computing power in the past decade has allowed researchers in more and more fields to apply machine learning to extract patterns and make predictions from huge datasets that would otherwise be impossible to analyze. But like AI, machine learning itself is a broad term that encompasses multiple subfields. For example, neural networks are inspired by how brains learn, with computation units called “neurons” linked via differently weighted connections. Then there’s deep learning, which involves very large neural networks that learn even the subtlest patterns using many layers of “neural” computation. And natural language processing involves algorithms that can learn from and respond to human language, like those in Amazon’s Alexa.
One of the most common setups you’ll come across in machine-learning studies is supervised learning, where algorithms are told the right answer as they learn. Here, algorithms are first trained to learn patterns from a training dataset. Then they are set loose on a test dataset to assess how accurate its predictions are. In other setups, an algorithm finds patterns without being given the right answer (unsupervised learning) or a mix somewhere in-between (semi-supervised learning). For more common AI terms, check out the glossary created by Matthew Hutson, a New York City–based freelance science writer (see below).
Should You Cover an AI Story? Watch for These Red Flags
These days, headlines glorifying the latest AI research abound—and many contribute to public perception of AI as a magical, mysterious thing. Jeremy Hsu, a freelance science and technology journalist in New York City, has been covering AI and machine learning for a decade. “My approach has evolved over the years,” he says. Early on, an AI story was often newsworthy just because it was the first study in an area to use AI. Now, he says, “it’s being used so widely, I do have to try to be a bit more selective in what is worth covering.”
Often, that means weeding out subpar studies. In 2019, Hsu highlighted some common red flags in “3 Easy Ways to Evaluate AI Claims” for IEEE Spectrum. He emphasizes watching for what experts call “hype salad” (strings of buzz words like “AI” and “Internet of Things” without thorough explanations) , low-quality data input into an algorithm, and excessive secrecy from the researchers about the design or limitations of their study.
Any AI technology is only as good as the information its algorithms are given, so it’s important to understand as much as possible what the algorithms driving any AI system do, and how.
Any AI technology is only as good as the information its algorithms are given, so it’s important to understand as much as possible what the algorithms driving any AI system do, and how. For many studies, that can involve taking a deep dive into the data used to train the algorithm as well as the test data used to assess its performance, both of which are important for understanding how valid the results are, Hsu says. Ask yourself: Do the training and test datasets for, say, an algorithm trained to understand human speech include the full scope of dialects in the language? If an algorithm is trained to diagnose Alzheimer’s disease from brain scans, do the scans represent the best data available, such as well-labeled, high-resolution images? Does an algorithm designed for use in schools across the country include a representative sample of students?
The size of the sample matters, too; Hsu notes an algorithm trained on data from only a few dozen people cannot generalize to an entire population.
A close look at the datasets can also reveal troubling ethical issues, such as possible bias built into the algorithm. Rod McCullom, a science writer and columnist for Undark based in Chicago, Illinois, points to a lack of demographic diversity in data as a major red flag. Depending on how the technology is being used, the effects can be disastrous. McCullom has reported for Undark on how racially biased algorithms may be exacerbating the disproportionate toll COVID-19 has taken on Black Americans and the understudied problem of racial bias in AI facial recognition software.
One challenge in evaluating AI studies is their “black box” nature such that researchers only see what goes into the algorithm and what comes out. As Hutson and others have reported, the mystery of what the algorithm does in between has contributed to a reproducibility crisis in AI and highlights the peril of overgeneralizing the results of any single AI study. For journalists, it’s essential to stringently assess the results and how they compare to what’s been done before.
A crucial element of this assessment is to take a critical look at the reported accuracy of the algorithm, paying special attention to the error rate and its context. Hsu says even if the overall accuracy is good, a high false positive or false negative rate could still be a red flag—especially in medical settings, where unnecessary treatments or missed diagnoses can be life-threatening.
Finally, check whether a new AI model offers a significant and realistic advance before covering it. It’s useful to ask whether the AI model is being compared to a valid benchmark, such as the current state-of-the-art in the field, Hutson says. Hao also talks to experts in the field who aren’t using AI themselves. For a study claiming that its deep learning algorithm could help improve education outcomes, for example, Hao might ask an education specialist whether they think the AI model could actually be implemented in a real-world setting.
“Oftentimes, as science writers, we don’t question the usefulness of technology. Do we need this? That’s always something that’s worth asking,” McCullom says.
Not an Automated Task: Crafting the AI Story
Once you’ve thoroughly vetted a study or story angle and you’re ready to write about it, it’s important to consider what role AI should play in the story. Try not to make a flashy AI model the focus of a story at the expense of other crucial pieces. “One of the most common problems that I see in covering this space is people will center the story on the machine learning when that might not actually be what you should be centering the story on,” says Hao. McCullom often focuses on how people interact with the technology, positioning AI as a tool to help solve a problem. In a Nature piece about AI approaches using Twitter to identify teenagers at risk of gun violence, McCullom set the scene with the tragic murder of a teenage girl in Chicago, then traced the journey of researchers at Columbia University who were inspired by her story to create the algorithms.
McCullom suggests providing readers with a few basics: what the algorithm can potentially do now, how it was designed, and how it could be deployed in the real world. Hsu orients readers by identifying the algorithm as machine learning or deep learning, then focuses on what the algorithm does in simpler, jargon-free terms.
When describing the implications of a study, be clear on how much the results can—or can’t—be generalized to other contexts.
From there, you might add as much detail as your word count allows on the make-up of the testing and training datasets and how the algorithm was trained. When describing an algorithm that learns to generate a new image, for example, Hao walked through the training process on text captions and images with missing pieces, and the potential uses to help robots better understand their visual environment.
When describing the implications of a study, be clear on how much the results can—or can’t—be generalized to other contexts. If a study found the algorithm was 98 percent accurate, that means “it’s 98 percent accurate based on this one test run on this one particular test dataset,” Hsu says. His Undark story on whether AI-driven medical tools will help or hurt patients emphasized that results from one hospital’s dataset may not carry over to another.
Hao likes to make AI models more accessible for readers by describing the people who worked on them. She may point out that someone collected the data, another person labeled it, and a third put that data into the algorithm and trained it on a server. “Sometimes, we write about machine learning in a very abstract, vacuous way where the machine learning algorithm just exists,” she notes. If “the reader can imagine themselves as one of those people … then it becomes just a little less magical.”
It’s Not All about AI
Computational modeling is another data-analysis technique that may rely on extensive computing power. Rather than automatically learning from data as AI does, computational models use mathematics to study large dynamic systems, often through simulations.
Bruce Y. Lee, a senior contributor at Forbes and a professor at the CUNY School of Public Health and Health Policy, is an expert in developing and applying computational models that assist decision making in health and health care. He says they offer scientists a powerful way to represent scenarios that can’t be easily observed or ethically tested, such as how pandemics spread. Many COVID-19 transmission models, for example, rely on simulations from computational modeling, including work from Lee’s team.
“Nothing can predict the future with certainty—absolutely nothing.”
Another field where computational models reign supreme is climate change—and these are especially complicated. But complexity doesn’t mean a model is correct. “The biggest thing to remember is they don’t present inevitable conclusions,” says Chelsea Harvey, a climate science reporter for E&E News. She suggests describing results as possible outcomes, along with any limitations or uncertainties of the model.
Lee adds that it’s a red flag if the researchers act like their model is a crystal ball. “Nothing can predict the future with certainty,” he says. “Absolutely nothing.”
To better understand the predictions, Harvey recommends understanding the assumptions that went into each model and how they affect the outcomes. Many studies present models on a range of scenarios, and it can be tempting to choose to report only the most dramatic results, she says. But she cautions that often the assumptions for those scenarios mean that those results are not the most likely outcome. And any reluctance by the researchers to describe the details or limitations of a model should be a warning sign, Lee says.
Some studies may use both computational-modeling and machine learning methods. “These boundaries are actually more gray areas, and there’s tremendous overlap,” says Lee. For IEEE Spectrum, Hutson reported on a COVID-19 spread model that uses machine learning to find the parameters that lead a computational modelling simulation to make the most accurate predictions.
But regardless of the label, “it’s much more important to really explain what the model actually does,” Lee says. Covering science using AI and computational modeling is really no different from other highly technical topics that science journalists are already familiar with. Once you’ve spent some time learning the terminology, Hao says, “it will start unpeeling all the jargon, and it starts to become really graspable.”