The Social Media Data We Need to Answer Key Research Questions

May 4, 2022  ·   Policy

Ahead of a Senate Judiciary Subcommittee hearing on platform transparency, we submitted a letter outlining the type of research questions we want to answer — and the social media data we need to answer them.

A phone displaying social media apps.

Credit: Jason Howie

We submitted the following letter [view as PDF] to the Senate Judiciary Subcommittee on Privacy, Technology, and the Law ahead of its hearing on "Platform Transparency: Understanding the Impact of Social Media."


Dear Chair Coons and Ranking Member Sasse:

Thank you and the entire Senate Judiciary Subcommittee on Privacy, Technology, and the Law for holding this important hearing today on “Platform Transparency: Understanding the Impact of Social Media.”

Over the past two decades, social media and other digital technologies have transformed our society. It’s made it easier than ever to find information, engage with politics, and connect with people across the globe. But it’s also helped fuel misinformation, enable harassment, and foment polarization, presenting urgent challenges to democratic governance.

As Co-Directors of NYU’s Center for Social Media and Politics (CSMaP), we lead a team of academic researchers studying how this ever-shifting online environment impacts politics, policy, and democracy. It’s hard to overstate how critical data access and platform transparency is to our research.

Since starting this work more than a decade ago, we’ve made great progress in understanding the impact of social media on our society. We’ve quantified how protest movements like the Revolution of Dignity in Ukraine, those horrified by the Charlie Hebdo terrorist attacks in France, and groups like Black Lives Matter, the Women’s March, and opponents of the Common Core Curriculum in the United States used social media to organize. We’ve analyzed the impact of Russian foreign interference in the 2016 elections and hate speech both in the United States and abroad. And we’ve found that many conventional wisdoms about social media — that everyone shares fake news, or that we all live in echo chambers — aren’t actually true.

But from the beginning, we’ve been conducting our research with one hand tied behind our backs. That’s because, as you know, social media companies tightly control the data necessary to study the platforms’ impact. As a result, academic researchers are limited in our efforts to get a handle on the scale, character, and causes of the various phenomena attributed to the rise of social media. Instead, we’ve turned to alternative methods — surveys, experiments, browser plugins, scraping, and others — to try to glimpse from the outside what internal analysts can easily see on the inside.

That’s why we are grateful to you for holding this important hearing on platform transparency. Over the last several years, policymakers have introduced a number of bills to regulate social media, focusing on topics ranging from antitrust to algorithms to children’s safety. But unless researchers have access to more social media data, legislators risk operating in the dark.

We are heartened by Senator Coons’ bipartisan Platform Accountability and Transparency Act, introduced last year with Senators Amy Klobuchar and Rob Portman, and drafted with assistance from one of the witnesses at this hearing, Stanford Law Professor Nate Persily. This measure would go a long way in securing the data we need to better study social media platforms.

The conversation about what social media data we need should start with what questions we want to answer with those data. As you hold today’s hearing, we respectfully suggest you focus on the following areas:

  • How prevalent is misinformation (e.g., “fake news”) on social media, how often is it viewed, and who shares it? In order to answer these questions, we need access to exposure data — i.e., what posts appear in people’s timelines when they are actually using the platforms — as well as engagement data, i.e., what posts people have clicked, shared, liked, etc.

  • Do recommendation algorithms lead to people being exposed to extremist content or misinformation? In order to answer this question, we need access to recommendation data, i.e., what content the platforms recommend that people view and which accounts, pages, and/or groups they recommend users follow. Protecting researchers’ ability to scrape data from publicly available websites would be an important first step in this regard, but only a first step (and would require platform data on recommendation systems).

  • When a user views social media content, what impact does that exposure have on their belief in topics such as the legitimacy of democratic institutions or scientific information about covid? To answer these types of causal questions, i.e. “what is the effect of X on Y?”, we need to be able to run randomized control trials, where some participants in a study are assigned to view content in one way, while others are assigned to view it another way. (This could include some participants viewing content using the existing algorithm, and others viewing it in an alternative way, e.g. chronological.) In many cases, this type of study would require cooperation from the platform.

The online information environment is massive and complex. Studying it requires a robust data infrastructure, diverse substantive expertise, and cutting-edge methodological innovation. Doing this work for the past decade has been rewarding but challenging. More access to social media data is critical to help us better understand how our new online ecosystem shapes society. We thank you for your leadership, and we are here as a resource as Congress continues to tackle this important issue.


Jonathan Nagler
NYU’s Center for Social Media and Politics

Joshua A. Tucker
NYU’s Center for Social Media and Politics