Guide to Tracking Source Diversity

A group of faces made of tissue paper, with homogenous gray faces on the left transltioning to a diversity of colors on the right.


Tracking the diversity of sources included in media stories is one key tool in journalists’ work to make sure that their stories reflect the communities they cover, particularly with respect to including communities that have historically been underrepresented and marginalized. At its core, source diversity tracking can be quite simple; it’s the process of collecting information about sources’ race, ethnicity, gender identity, sexuality, disability status, or other identities. This information, when aggregated, gives newsrooms or individual journalists a fuller picture of who they are including in their stories.

The prospect of beginning to track demographic information about one’s sources, whether as an individual reporter or as part of a whole-newsroom effort, can seem daunting. What aspects of diversity should you track, and what is the goal? How should you gather the information? Will sources be offended if they are asked to disclose information about their race, gender, or other aspects of their identity? Does asking for such information violate any privacy laws? How do you avoid tokenism in setting goals for source diversity?

In many cases, there is no single answer to such questions. However, there is much to be learned from the many newsrooms who already track their sources as a means of making their reporting more inclusive. That includes media outlets such as NPR, Science, Nature, STAT, Science News for Students, Wisconsin Public Radio, North Carolina Health News, the Gastropod podcast, KUT Austin, The Kansas City Beacon, the BBC, and others. It also includes many individual reporters, both staff and freelance, who independently track their own sources.

We’ve assembled this source-diversity tracking guide to answer frequently asked questions and provide a sample script and survey that can be adapted to journalists’ needs. In developing this guide, The Open Notebook asked Rochita Ghosh and Fairriona Magee at the Missouri School of Journalism, with supervision by their professor, Sara Shipley Hiles, to speak with numerous reporters, editors, and others who track demographic information about their sources. We’ve also closely studied published case studies and industry analyses (see resource list below). And we’ve drawn on our experience in creating our own source-tracking system, which we use to inform our work at The Open Notebook.

One key lesson from this process is that any system of tracking source diversity is bound to be imperfect. Practically speaking, it is nearly impossible to fully capture the true diversity of intersectional identities represented in a publication’s pages. But another key lesson is that any thoughtful and intentional effort to better understand who is included in your stories is better than doing nothing at all.

Including diverse sources enriches stories by reflecting the authentic experiences and expertise of a wide range of people, including perspectives that are distinct from more mainstream voices. Furthermore, including diverse sources is critical to help recognize and counteract the implicit bias we all absorb when only a narrow range of people are presented as authorities.

But finding sources from diverse communities who can speak to a particular story can be challenging, especially when societal biases make it harder for those in the margins to acquire the credentials and connections likely to bring them to the attention of a journalist. Tracking source diversity helps focus the attention of journalists and newsrooms on this task and illuminates where progress has been made and where improvement is needed. Making the results of source-tracking efforts public offers a further opportunity to establish greater accountability.


Journalists interviewed for this guide reported that source diversity tracking strengthened their storytelling and made them more cognizant of their editorial choices. It also offered a tangible way to measure progress. Some journalists found that they were not as good at diverse sourcing as they thought; for others, the data confirmed that their efforts over time paid off and clued them into ways they could further improve. Either way, source diversity tracking holds a mirror up to our practices so we can see them clearly and act accordingly.

There are countless dimensions of diversity that a source-tracking system could encompass, including race, ethnicity, gender identity, sexual orientation, disability, age, geographic location, country of origin, socioeconomic status, religious background, veteran status, and more. In most cases, a system that tracks all these dimensions of diversity would be too cumbersome—both for the journalists collecting and analyzing the data and for sources who are asked to provide the information—to be practical. Most journalists and newsrooms that track source diversity limit their tracking to just a handful of dimensions, with race/ethnicity and gender identity being the most common.

We recommend starting small, choosing about three dimensions of diversity to track. You can always expand your efforts later.

It’s impossible to track progress toward a goal if you don’t have a goal in the first place, but setting the appropriate goals isn’t easy. You may not be able to set reasonable goals until you’ve gathered data on the current and past diversity of sources in your reporting. Still, it’s helpful to begin considering your goals right away. Here’s what we recommend:

First, consider the makeup of the population you cover or of your outlet’s audience, or some combination of the two. For example, if 40 percent of your readers are people of color but 80 percent of your sources are white, then you are not (yet) accurately representing your audience. The same is true if a quarter of your readers have some type of disability but only 5 percent of your sources are disabled. (Of course, you may or may not have data on the demographics of your readership.)

Keep in mind that sometimes trying to match the diversity of your sources to the diversity of your audience may not be a realistic goal. Some fields of science, for example, are so heavily white and/or male that it would be hard to find sources who are female or non-binary or are people of color in numbers that are proportionate to those of the general public, for example. That said, too closely representing the diversity (or lack therein) of the scientific workforce you are covering could inadvertently perpetuate inequities in that field. For example, if 21 percent of tenured mathematics faculty are women but only 21 percent of sources in articles about mathematics are women, it fuels the impression that mathematics is for men. In such cases, you may decide to work extra hard to increase the percentage of that group’s representation in your stories.

For example, the creators of the Gastropod podcast, Nicola Twilley and Cynthia Graber, set a goal that at least 40 percent of speaking time about science on their show should be women’s voices and that at least 40 percent should be the voices of Black, Indigenous, or people of color. In setting these goals, Twilley and Graber took into account both the demographic makeup of the U.S. population and the makeup of the scientific community.

Second, consider how your source diversity has evolved over time. Numerous individual journalists and media outlets have conducted source audits, gathering information about sources included in past coverage as a way of establishing a baseline for goal-setting. (See below for discussion of source-tracking methods.)

Third, set an ambitious but achievable goal, recognizing that your goals will probably shift over time. For example, if you have found that only 20 percent of your sources in the past three years have been people of color, while 50 percent of your readers are from communities of color, perhaps aim to have 35 percent of your sources be people of color over the next year.

As you make progress toward meeting your initial goals, reevaluate them, as you  may be ready to set a more nuanced goal (and a more elaborate tracking system to go with it). For example, are experts of color quoted more often in stories that have to do with diversity, equity, and inclusion in science than in other stories (a more subtle form of exclusion that can further marginalize them)? Are disabled sources included mostly in talking about their own lived experience, or are they also consulted as experts? Are there additional dimensions of diversity that you should consider beginning to track?

Whatever goals you set, make sure to put a plan in place to analyze your progress regularly and adjust your goals as needed. The process of seeking greater equity and inclusion in journalism is never finished.

It’s not always easy to talk about race, gender, sexuality, and other identities, and some journalists may feel uncomfortable asking their sources such questions or may worry that doing so is an inappropriate invasion of sources’ privacy. In our own experience in tracking source diversity at The Open Notebook, and based on what we’ve learned from other journalists and newsrooms that have extensive experience in tracking source diversity, sources are seldom bothered by efforts to gather such information, as long as they understand the reasons for doing so and trust that the data will be handled responsibly.

In collecting personal information from sources, it’s crucial to be transparent about your intentions. Let sources know that their responses are optional and will be kept confidential, and that you are collecting this information for the sole purposes of improving the diversity of your reporting.

It may be tempting to avoid the entire topic of how to ask sources about their personal identities by not doing so, instead inferring people’s identities by scanning their websites, social media profiles, or other information online, or by relying on your own personal knowledge or perceptions. We do not recommend this “guessing and Googling” method of tracking source diversity, for several reasons. First, trying to discern information in this way is likely to result in inaccurate data, since any assumptions journalists make about aspects of sources’ identities such as their race or gender identity can easily be wrong. Second, in some cases, collecting and storing demographic information without people’s knowledge may be a violation of certain privacy laws (more on this below).

It’s also important to be thoughtful about the language you use to explain what you’re doing and to collect demographic information. Specifying one’s identity is often more complicated than just applying a label, and the same label may not mean the same thing to everyone. Be aware, too, that U.S. Census categories do not capture the full range of possible racial/ethnic identities around the world. Depending on the geographic diversity of your sources, you may want to develop a more extensive list of response options.

Also be aware of the general preferences of people in certain communities for terms used to describe them. For example, disabled people tend to prefer the use of the term “disability” as opposed to euphemisms such as “special needs.”  (To learn more about language issues that often come up in discussions of identity and diversity, take a look at our collection of diversity style guides for journalists.)

After studying numerous approaches to source tracking (see full resource list below), The Open Notebook has created a sample script and survey for tracking source diversity that reporters can use or adapt to their needs. This survey asks sources about their geographic location, race/ethnicity, gender identity, LGBTQ+ identity, and disability. Each question offers a “Prefer not to disclose” option and an “Other or prefer to self-describe” option.

When drafting diversity-survey language, newsrooms may wish to consult with an advisory group of knowledgeable journalists or diversity advocates. As we have written elsewhere, though, be sensitive to the fact that people from historically marginalized groups should not be expected to take on the emotional and time-consuming labor of helping their colleagues solve the problem of lack of diversity in their sourcing; that’s especially true when such efforts are unpaid.

Finally, if you’d like to seek feedback from your sources about their perceptions and suggestions regarding your source-tracking methods, consider adding a survey item to solicit feedback.

There is no one-size-fits-all method for tracking source diversity, and no process is perfect. But amidst endless options, remember that any earnest and thoughtful attempt to be inclusive and to listen to people from marginalized communities is better than doing nothing.

In setting up a source-tracking system, there are a few core logistical issues to decide:

  • Who will be responsible for collecting source information? (Individual reporters, editors, or some designated person in the newsroom? If your newsroom uses freelancers, who will be responsible for tracking the sources for their stories?)
  • When will the information be collected? (During interviews? During fact-checking? After stories are published?)
  • How will you collect the information? (By phone? By email? Through a tool such as Google Forms?)

One advantage of asking sources to provide demographic information by email or via a tool such as Google Forms is that it provides a greater sense of privacy and tends to be faster than doing so by phone. Explaining the survey and its purpose to sources and reading all the response options aloud can be time-consuming.

To ensure the best possible response rate, it’s a good idea for reporters to mention to sources that they will be sending a follow-up survey and to explain the reasons for it and emphasize that participating in the survey is voluntary. We recommend developing standard language that can be used with all sources, so that reporters don’t have to think through what to say each time they discuss the tracking process with sources.

In addition to collecting sources’ demographic information, you may want to also record additional information such as the story type (for example, news vs. features), the reporter’s name, the date of publication, and so on. Be sure to collect all the information that will enable you to analyze your data in a way that’s meaningful for you.

Here are a few more logistical questions you may want to think about:

  • Whose information will you collect? (All sources who are named in a story? Only sources who are directly quoted? What about spokespeople or statements from public figures that are included in your story?)
  • Will source tracking be a requirement in your newsroom, or will it be voluntary?
  • Who will have access to sources’ demographic information?
  • Will the information be part of a source database used for reporting purposes?
  • How will you train the people responsible for conducting source tracking?
  • How will you keep track of whether source-tracking has been completed for a specific story and what your response rates are?

How We Track Sources at The Open Notebook

Here at TON, we began tracking the diversity of all our sources in January 2022. (We are also in the midst of doing a three-year historical source audit.) After considering many possible methods, here is the system we settled on:

  • Our reporters provide sources with a brief survey to gather demographic information, then enter that information, along with other details about the story, into a Google Form that yields an associated spreadsheet. In this way, reporters have only information that pertains to their own stories, but information for all stories goes into one place.
  • All questions (except for sources’ geographic location) can be answered with “Prefer not to disclose,” and where it makes sense, we make it possible for people to select more than one response option or to provide an open-ended self-description.
  • We do not put sources’ names or other personally identifying information into our tracking form. This provides an added measure of privacy protection for our sources.
  • The only person who has access to our tracking spreadsheet is TON’s editor-in-chief. This document is stored securely and protected by multi-factor authentication.
  • TON’s editor-in-chief analyzes the results of our source tracking at least twice a year and will make the results of this analysis available to the Board of Directors. (The same will be true for our historical source audit, which is underway.) We will publish the results of our source tracking on our website once a year. Watch for it!

You can learn more about the TON source-tracking system here and adapt it for yourself if you like.

Regardless of whether data are collected by phone, by email, or through a survey form that sources fill out themselves, they need to be entered into a spreadsheet or database. This can be as simple as a Google Sheet or as sophisticated as a specialized tool built into the newsroom’s content management system. For example, NPR’s “Dex” (as in “Rolodex”) tool allows reporters to enter sources’ demographic information into the outlet’s CMS, which then functions as a source database for reporters.

Some newsrooms use their source-tracking data only for internal accountability purposes, while others publish the results of their source-diversity tracking (see resource list below). Whether you make the results public or not, it’s essential to actually do something with your data. For individual journalists, that may mean redoubling efforts to diversify their networks of sources, or creating intentional strategies for keeping source diversity top of mind while in the heat of reporting. For editors and newsrooms, it could mean analyzing data to identify reporters who may need more support to further diversify their sources. Or it might mean building internal systems for doing so. It could also inspire newsrooms to increase staff diversity, to more deeply consider the types of stories they cover, and to work to improve their relationships with historically undercovered communities.

In researching many different source-tracking efforts, we have learned of almost no cases in which a source has objected to diversity-tracking questions or taken offense at the process. Reporters who have undertaken this process have reported overwhelmingly positive experiences, and most sources are willing to share demographic information.

That said, journalists can make sources feel more comfortable by explaining the purpose, assuring them that all responses are voluntary, and emphasizing that their individual information will not be shared publicly and that data will be stored securely. If a source declines to share certain information, that’s fine. Source-tracking surveys should make it clear that declining to respond is a viable option for any question.

First, a disclaimer: We at The Open Notebook are not equipped to provide legal advice. Below, we share what we have learned about how privacy laws relate to source-tracking, with the caveat that it is every media organization’s responsibility to know whether and how they may collect, store, and share information about sources in their stories (even in aggregate), and to comply with all relevant laws and regulations.

Collecting and storing personal demographic information is not inherently a violation of privacy laws, as long as you take certain basic steps. The data privacy law that most people are familiar with, and which is among the most strict, is the European Union’s General Data Protection Regulation, or GDPR. These regulations apply to any organization that operates within the EU or offers goods or services to customers in the EU. Other countries—including the U.S. (as well as U.S. states), U.K., Canada, Japan, Norway, Australia, and others—have their own data privacy laws.

The key considerations underlying these laws is that they require that organizations can only collect or store people’s personal information if:

  • They have explicit permission from the person to do so (hence our advice not to assemble demographic information by “guessing and Googling”).
  • They take appropriate steps to ensure that the data will be stored and handled securely and that it will be used lawfully and only for specified purposes.

Regardless of any legal obligations, it’s good practice to make sure the data collected from sources is kept confidential, including by:

  • Restricting who in the newsroom is able to access the individual data.
  • Using multi-factor authentication to further protect the data.
  • Aggregating data before sharing results with the full newsroom or publicly.

Source diversity tracking can help journalists and newsrooms be more intentional in whom they include as sources. But focusing on increasing inclusion of diverse voices can quickly turn into tokenism. Quoting a member of an underrepresented community as a source just so that you can fulfill an obligation or meet an artificial quota is not the goal. Be careful to avoid taking advantage of sources or trivializing their involvement simply for the sake of ticking off a “diversity” box. On the other hand, it’s important to recognize that people have different types of expertise they can bring to a story, including their own experiences. It’s also worth including fresh voices in your stories rather than quoting the same experts time and again.

The best way to avoid tokenism is to make sure every person you include in a story has expertise, experience, or perspective that adds something meaningful and relevant to your story. You might consider asking yourself: Does this source advance my story? Or am I only including them to increase my reported source diversity?

It’s also important not to overuse the same few sources from underrepresented groups over and over again. Instead, work to build a truly expansive network of diverse sources relevant to your beat.

Industry Analyses

Case Studies

Related Readings

(Editors’ Note: Thanks to Rochita Ghosh, Nina Goswami, Cynthia Graber, Lila Guterman, Lillian Steenblik Hwang, Jess Mackie, Fairriona Magee, Sara Mupo, Rodrigo Pérez Ortega, Julie Rehmeyer, Sara Shipley Hiles, Nicola Twilley, Carolyn Wilke, Alexandra Witze, Keith Woods, Katherine Wu, Rachel Zamzow, and Sarah Zielinski for contributing ideas and feedback for this resource page. Any errors or omissions remain the responsibility of The Open Notebook.)

Last updated October 4, 2022

Skip to content