Hidden in Plain Sight: Using Public Documents to Report on Elusive Stories

  Léelo en Español

Detective board with photos of various pieces of evidence with red threads and notes connecting them.
DedMityay/iStock

Lauren Weber’s subjects don’t always return her calls. She reports on anti-vaccine advocates, doctors prescribing unapproved treatments, and other purveyors of medical misinformation, many of whom are skeptical of the media.

But Weber, a health and science accountability reporter at The Washington Post, has nonetheless managed to piece together their fundraising, politicking, and harmful misdeeds by digging deep into the public realm. To produce detailed investigations into her subjects, she’s streamed videos of local government meetings, combed through nonprofit tax filings, and analyzed piles of physician disciplinary records from state medical boards—all of which were available to anyone who cared to (and knew where to) look.

As far as information sources go, whistleblowers and leaked files will always have their appeal. But the public domain also contains revelations about hard-to-reach people, companies, organizations, and government agencies that are just waiting to be unearthed. This intel can bolster your reporting or be the basis of entire stories, especially when those at the center of controversy are unavailable or unwilling to talk.

At a time of historically high media mistrust, open documentation has the added bonus of boosting credibility. “Especially in an era where no one believes anything they see or read,” Weber says, “being able to quote from public documentation or from speeches or from meetings or from other legal resources is worth its weight in gold.”

Such information exists in a multitude of forms—from lawsuits, corporate filings, and government databases to payment app histories and social media posts—so it takes imagination and persistence to connect the dots. Reporters must also stay organized as they amass information to hone a cohesive and compelling angle. And, perhaps most importantly, they have to critically evaluate each document they dig up with the help of trusted experts.

 

Following Paper Trails

When you’re reporting on entities that are politically influential, wealthy, or potentially putting lives at risk, basic information—such as the people in charge or their funding sources—can be surprisingly hard to come by. Start by researching which laws and regulations a person or company is required to comply with, whether they be financial, environmental, medical, or some other category. “I would advise sitting down and trying to mentally map out all the different agencies—local-level, city-level, state-level, federal-level—that may affect [what] you’re looking into,” says Naveena Sadasivam, a senior staff writer and editor at Grist. “Think really broadly about what they may be required to report to various regulators …” she says, “and what kind of paper trails might exist as a result.”

Often, investigations into elusive sources center on conflicts of interest or other problematic ties that might call into question the soundness of a researcher’s science or an organization’s claims. Even if the parties at play remain silent, reporters can still trace the connections between them.

Sadasivam took this expansive approach when investigating medical-supply warehouses across the U.S. These warehouses stash supplies sterilized with ethylene oxide—a chemical that, according to the U.S. Environmental Protection Agency, can elevate the risk of cancer and other conditions with prolonged exposure. Sadasivam and her colleague, Lylla Younes, wondered about the harm it could be inflicting on people, such as warehouse workers or neighbors, who breathe it in day after day. But, as they discovered early on in their reporting, the facilities’ locations are often kept secret by manufacturers and aren’t closely tracked by regulators or advocates.

Sadasivam and Younes did learn from sources that some of these storage centers had been inspected by federal and state agencies in Georgia, including the Occupational Safety and Health Administration and Georgia’s Environmental Protection Division. The reporters found some of the state regulators’ findings online. They also filed public-records requests for emails about how those inspections were done, as well as for national regulators’ inspection reports. And an Atlanta News First/WANF reporter who published a joint investigation with them requested 911 call logs from local emergency medical services, which showed that ambulances were dispatched nearly two dozen times to treat warehouse workers with symptoms consistent with high-dose exposure to ethylene oxide.

Diving further, Sadasivam tracked down online records of permitting documents submitted to state environmental agencies—a common requirement for entities that emit air pollution—and struck gold in Texas: a company had turned in paperwork to expand ethylene oxide–emitting warehouses in El Paso. To flesh out a list of such warehouses nationwide, Sadasivam and Younes drew in part on other records. The EPA and Southern California’s air-pollution regulator had both asked companies to volunteer the locations of any such warehouses; in yet another round of public-records requests, the reporters asked the agencies to hand over the responses.

Paper trails might also yield the names of those harmed by an entity or person’s misdeeds. On a hunch that some of the medical-supply warehouse employees may have sued their employers, Sadasivam hunted for lawsuits in “as many databases as I could,” she says: PACER, which charges small fees to download filings for federal cases; CourtListener, where users can post such PDFs for open viewing; and LexisNexis. (LexisNexis requires a subscription; Sadasivam signed up for a free trial of a subsidiary service to browse local and state cases.) After numerous searches for “ethylene oxide,” “workers,” medical-device company names, and other keywords, Sadasivam stumbled upon a plaintiff who became the lead anecdote of one of her investigations.

The documents amassed in legal proceedings might even make available information that’s typically sealed. “You can’t assume that you can know or predict or expect all of the information that will be public,” says freelance journalist Dan Garisto. “Go looking for it.” In 2024, Garisto investigated a scientific scandal for Nature about Ranga Dias, a University of Rochester physics professor accused of research misconduct. The university had commissioned a series of preliminary inquiries into concerns that had been raised about the physicist’s work, and each time determined that there was no need for a full investigation to determine misconduct. But eventually, under an order from the National Science Foundation, the University of Rochester did launch such an investigation—and this time, found evidence of data fabrication, falsification, and plagiarism. Later, after Dias sued his employer, that investigation’s confidential report got entered into the court record, which Garisto and his editor tracked down in the court’s database with the help of a newsroom attorney. The report became a linchpin for Garisto’s story, allowing him to chronicle Dias’s extensive alleged misconduct and reveal weaknesses in the university’s earlier inquiries.

 

Tracing Concealed Connections

Often, investigations into elusive sources center on conflicts of interest or other problematic ties that might call into question the soundness of a researcher’s science or an organization’s claims. Even if the parties at play remain silent, reporters can still trace the connections between them.

As you organize your notes and pursue new leads, be thoughtful, not just thorough. Reporters should examine each finding in light of what else they’ve learned, asking themselves whether it’s getting them closer to the story they’re pursuing.

For a 2024 story, Susie Neilson, an investigative reporter at the San Francisco Chronicle, scoured the internet to unpack an unusual scientific partnership between Michael Snyder, a renowned geneticist at Stanford University, and Tony Robbins, the world-famous motivational speaker. Snyder had published research extolling the mental-health benefits of Robbins’s seminars, but Neilson noticed that some of the studies disclosed financial support from Robbins’s company. To untangle the ties between Robbins and Snyder, who had co-founded several biotech startups, Neilson looked up the companies’ business filings with the secretaries of state where they’d incorporated, which listed when they’d done so. To understand who was funding and running the companies, she also consulted LinkedIn profiles, business databases like PitchBook and OpenCorporates, and news articles about them. (The firms Neilson was researching were privately held, but publicly traded companies have to report extensive—and openly searchable—financial information to the Securities and Exchange Commission.)

“The more connections you find and document, the more then you’re able to tailor your searches to uncover additional layers of findings and connections,” Neilson says.

Tax filings can also reveal relationships between your subjects. For example, while reporting on notorious nonprofits spreading misinformation about COVID-19 vaccines and treatments, Weber wanted to know whether they’d raked in a bunch of cash and, if so, from whom. So, she looked up their annual tax forms on ProPublica’s Nonprofit Explorer. The groups shared their revenue figures, which did indeed grow during the pandemic, as well as their executives’ salaries. They didn’t disclose their donors, however—nonprofits generally aren’t required to do so. But because organizations do often have to disclose when they themselves donate, Weber changed her search filters to review filings from groups that gave to those she was tracking. She found, as her 2024 story reported, that some of the funders were dedicated to advancing biblical, libertarian, or conservative values. “It was a back-end way to find out who was shelling out some of this money,” she says.

Don’t forget to search the social web for evidence of connections. You might discover your subjects photographing each other on Tumblr and Instagram, for example, or chatting in podcasts and YouTube videos, as Neilson did when reporting on the Stanford scientist’s lab. Reporters have also mined activity on Venmo, the peer-to-peer payment app, to map the networks of public officials. “If you learn a couple of basic tricks for digging through the vast troves of online information there is at our fingertips,” Neilson says, “you can find out a lot about people.”

 

Unearthing an Angle

The more evidence you gather, the easier it is to lose your sense of direction. It’s critical to stay organized to make sure you’re building a cohesive story rather than falling down unnecessary rabbit holes. Consider tracking which information belongs to which document in a spreadsheet or loading documents onto Google Pinpoint, a research tool for journalists and academics, which makes documents machine-readable and can locate single words in thousands of PDFs.

Since many public documents exist online, remember that a link that works today may be dead tomorrow. Neilson prepared accordingly by capturing websites and social-media accounts in screenshots, downloading them as PDFs, and saving links to the Wayback Machine (which she in turn used to look at old versions of sites she was browsing). She also ripped the audio of videos and podcasts into MP3s and uploaded them to Trint, an artificial intelligence–powered tool that generates time-stamped transcripts, which users can search for keywords and quotes. As useful as file-sharing and AI services can be, however, reporters should consider keeping extremely sensitive materials offline altogether, so they won’t risk being subpoenaed through a third-party company.

Once files are in hand, don’t just take them at face value or dump them online WikiLeaks-style. Consulting outside experts, cross-referencing different data sources, and making an effort to speak with the people named in the documents are all crucial to solidifying your story.

As you organize your notes and pursue new leads, be thoughtful, not just thorough. Reporters should examine each finding in light of what else they’ve learned, asking themselves whether it’s getting them closer to the story they’re pursuing—or to a completely different one. “With any use of data, it comes down to: What is the question that you’re asking?” says Emily Alpert Reyes, a former health reporter for the Los Angeles Times. And don’t be afraid to let your question evolve during the course of reporting.

When Reyes was working on a 2025 story on allegedly harmful caregiving professionals, she says she “kept poking” until she found an interesting angle. She’d learned that California’s Department of Social Services bans some people from running assisted-living facilities due to concerns that residents were being neglected under their watch. While researching some of these misdeeds, Reyes requested a list of the banned operators’ names. But when she read a lawsuit against one of the operators, she learned that they’d gone on to form ties with another type of caregiving facility governed by a different regulator, the state’s Department of Public Health. In that agency’s online database, Reyes, along with her colleague Ben Poston, found several names of people who appeared to have been previously banned by the other regulator—an alarming and intriguing overlap. The resulting story investigated how inadequate regulations were apparently allowing these people to slip through the cracks.

Digging deeper also led Joaquín Rosado Lebrón, a health reporter for Metro Puerto Rico, to a more interesting story. While researching ethylene oxide, Rosado Lebrón came across an EPA press release with a nationwide list of sterilization factories that emit dangerous doses of the chemical. An unusually high number were in Puerto Rico—a finding that could have been a story in and of itself.

But Rosado Lebrón and his editor had the idea to compare the list to another one from Puerto Rico’s commerce department. The second listed companies getting tax breaks to operate on the island, and it included all the EPA’s companies of concern. Rosado Lebrón knew he was onto something. His 2024 story, which he co-published with Sadasivam and Younes at Grist as well as with the Centro de Periodismo Investigativo, showed the power of “being able to cross-reference and being able to come up with a story that is strong enough that it won’t be debunked by officials,” Rosado Lebrón says.

 

Vetting and Confirming Your Findings

Once files are in hand, don’t just take them at face value or dump them online WikiLeaks-style. Consulting outside experts, cross-referencing different data sources, and making an effort to speak with the people named in the documents are all crucial to solidifying your story.

Subject-matter experts can help interpret technical terms, point out limitations or omissions that may not be clear to you, and vet your conclusions. Garisto, for example, shared the University of Rochester’s investigation report with a few trusted physicists, asking them to weigh in on how thorough and technically sound the inquiry into Dias’s data appeared to be (and to keep it under wraps until his scoop ran).

It’s crucial to check each document that your reporting rests on for accuracy. Cross-referencing names and details across multiple documents is a key tool for ensuring your story is ironclad.

Outside guidance can also help narrow down a massive amount of material. When Weber was reporting a 2023 story about health-care providers spreading COVID-19 misinformation, she and her colleagues obtained disciplinary and investigation records from all 50 states’ medical boards—requests that generated a mountain of paperwork. Weber asked medical, legal, and medical-board experts to help set criteria for including cases of doctors who’d been punished for spreading falsehoods about the virus. “It’s just important, throughout any story like this, to have a cadre of people that you can call upon to make sure that you can parse some of the terminology and legal pieces of it, to best explain it to readers,” Weber says.

For stories based on public information that might be sensitive or lead to legal ramifications, journalists should double down on fact-checking and ethical processes. Before publication, for example, make sure to seek comment from people named in documents, particularly when they’re the subjects of allegations based on said documents. Subjects may have a very different opinion of how to read the documents, or offer information that contradicts or contextualizes. The timing of when to reach out depends on the situation: You may prefer to wait until you’ve gotten feedback from other sources and feel like you have a solid grasp on the story’s direction. Whether or not a subject talks to you, it’s good practice to send a “no-surprises” letter detailing your findings and offer a final chance to comment.

It’s also crucial to check each document that your reporting rests on for accuracy. Cross-referencing names and details across multiple documents is a key tool for ensuring your story is ironclad. But verifying information and reconciling inconsistencies can take a staggering amount of work. While reporting their caregiving story, for example, Reyes and Poston discovered that some people banned from operating one kind of caregiving facility had the same names as people who were currently operating another—but they couldn’t get California state agencies to confirm that these were the same individuals. So, the journalists set out to cross-reference names in business filings, lawsuits, bankruptcy records, and facility licensing applications. They even knocked on numerous doors.

As daunting as that months-long process was, an editor pointed out that it was also revealing: If they couldn’t easily prove these people’s identities, then neither could the public. “It’s not just an obstacle—this speaks to a consumer issue,” Reyes remembers him saying. So, the reporters decided to address the matter head-on—by writing a sidebar that shined a light on the bureaucratic barrier. In the end, the process of searching for answers in documents was a revelation in and of itself.

 

Stephanie M. Lee Courtesy of Stephanie M. Lee

Stephanie M. Lee is a senior writer at The Chronicle of Higher Education, where she writes about the intersection of scholarship and society. She was previously a science reporter at BuzzFeed News. She won the 2022 Victor Cohn Prize for Excellence in Medical Science Reporting, and her stories have been anthologized in The Best American Food Writing and noted in The Best American Science and Nature Writing. Follow her @stephaniemlee.bsky.social on Bluesky and at stephaniemlee.com.

Skip to content