Academic Research
CSMaP faculty, postdoctoral fellows, and students publish rigorous, peer-reviewed research in top academic journals and post working papers sharing ongoing work.
Search or Filter
-
Journal Article
Quantifying Narrative Similarity Across Languages
Sociological Methods & Research, 2025
How can one understand the spread of ideas across text data? This is a key measurement problem in sociological inquiry, from the study of how interest groups shape media discourse, to the spread of policy across institutions, to the diffusion of organizational structures and institution themselves. To study how ideas and narratives diffuse across text, we must first develop a method to identify whether texts share the same information and narratives, rather than the same broad themes or exact features. We propose a novel approach to measure this quantity of interest, which we call “narrative similarity,” by using large language models to distill texts to their core ideas and then compare the similarity of claims rather than of words, phrases, or sentences. The result is an estimand much closer to narrative similarity than what is possible with past relevant alternatives, including exact text reuse, which returns lexically similar documents; topic modeling, which returns topically similar documents; or an array of alternative approaches. We devise an approach to providing out-of-sample measures of performance (precision, recall, F1) and show that our approach outperforms relevant alternatives by a large margin. We apply our approach to an important case study: The spread of Russian claims about the development of a Ukrainian bioweapons program in U.S. mainstream and fringe news websites. While we focus on news in this application, our approach can be applied more broadly to the study of propaganda, misinformation, diffusion of policy and cultural objects, among other topics.
-
Journal Article
Labeling Social Media Posts: Does Showing Coders Multimodal Content Produce Better Human Annotation, and a Better Machine Classifier?
Political Science Research and Methods, 2025
-
Working Paper
The Effect of Deactivating Facebook and Instagram on Users’ Emotional State
Working Paper, April 2025
We estimate the effect of social media deactivation on users’ emotional state in two large randomized experiments before the 2020 U.S. election. People who deactivated Facebook for the six weeks before the election reported a 0.060 standard deviation improvement in an index of happiness, depression, and anxiety, relative to controls who deactivated for just the first of those six weeks. People who deactivated Instagram for those six weeks reported a 0.041 standard deviation improvement relative to controls. Exploratory analysis suggests the Facebook effect is driven by people over 35, while the Instagram effect is driven by women under 25.
-
Working Paper
The Effects of Political Advertising on Facebook and Instagram Before the 2020 US Election
Working Paper, May 2025
We study the effects of social media political advertising by randomizing subsets of 36,906 Facebook users and 25,925 Instagram users to have political ads removed from their news feeds for six weeks before the 2020 US presidential election. We show that most presidential ads were targeted toward parties’ own supporters and that fundraising ads were most common. On both Facebook and Instagram, we found no detectable effects of removing political ads on political knowledge, polarization, perceived legitimacy of the election, political participation (including campaign contributions), candidate favorability, and turnout. This was true overall and for both Democrats and Republicans separately.
-
Journal Article
Misinformation Beyond Traditional Feeds: Evidence from a WhatsApp Deactivation Experiment in Brazil
The Journal of Politics, 2025
In most advanced democracies, concerns about the spread of misinformation are typically associated with feed-based social media platforms like Twitter and Facebook. These platforms also account for the vast majority of research on the topic. However, in most of the world, particularly in Global South countries, misinformation often reaches citizens through social media messaging apps, particularly WhatsApp. To fill the resulting gap in the literature, we conducted a multimedia deactivation experiment to test the impact of reducing exposure to potential sources of misinformation on WhatsApp during the weeks leading up to the 2022 Presidential election in Brazil. We find that this intervention significantly reduced participants’ recall of false rumors circulating widely during the election. However, consistent with theories of mass media minimal effects, a short-term change in the information environment did not lead to significant changes in belief accuracy, political polarization, or well-being.
-
Journal Article
Bottom Up? Top Down? Determinants of Issue-Attention in State Politics
The Journal of Politics, 2025
Who shapes the issue-attention cycle of state legislators? Although state governments make critical policy decisions, data and methodological constraints have limited researchers’ ability to study state-level agenda setting. For this paper, we collect more than 122 million Twitter messages sent by state and national actors in 2018 and 2021. We then employ supervised machine learning and time series techniques to study how the issue-attention of state lawmakers evolves vis-à-vis various local- and national-level actors. Our findings suggest that state legislators operate at the confluence of national and local influences. In line with arguments highlighting the nationalization of state politics, we find that state legislators are consistently responsive to policy debates among members of Congress. However, despite growing nationalization concerns, we also find strong evidence of issue responsiveness by legislators to members of the public in their states and moderate responsiveness to regional media sources.
-
Journal Article
To Moderate, or Not to Moderate: Strategic Domain Sharing by Congressional Campaigns
Electoral Studies, 2025
We test whether candidates move to the extremes before a primary but then return to the center for the general election to appeal to the different preferences of each electorate. Incumbents are now more vulnerable to primary challenges than ever as social media offers a viable pathway for fundraising and messaging for challengers, while homogeneity of districts has reduced general election competitiveness. To assess candidates’ ideological trajectories, we estimate the messaging ideology of 2020 congressional campaigns before and after their primaries using a homophily-based measure of domains shared on Twitter. This method provides temporally granular data to observe changes in communication within a single election campaign cycle. We find suggestive evidence that incumbents in safe seats moved towards the extreme before their primaries and back towards the center for the general election, but only when threatened by a well-funded primary challenge.
-
Working Paper
Do Age-Verification Bills Change Search Behavior? A Pre-Registered Synthetic Control Multiverse
Working Paper, March 2025
In January 2023, Louisiana enacted Act 440, requiring websites containing substantial adult content to verify users’ ages through government-issued identification or commercial verification services. Since the passing of this legislation, 17 additional states have adopted similar laws. Using Google Trends data and a preregistered synthetic control design, this paper examines the impact of these age verification requirements on digital behavior across four key dimensions: searches for the largest compliant website, the largest non-compliant website, VPN services, and adult content generally.Three months after the laws were passed, Our analysis reveals a 51% reduction in searches for the dominant compliant platform, accompanied by significant increases in searches for both the dominant non-compliant platform (48.1%) and VPN services (23.6%). Through multiverse analyses that incorporate multiple specifications and control group constructions, we demonstrate the robustness of these behavioral changes. Our point estimates remain consistent with our pre-registered hypotheses across 3,200 point estimates. Our findings highlight that while these regulation efforts reduce traffic to compliant firms and likely a net reduction overall to this type of content, individuals adapt primarily by moving to content providers that do not require age verification. Our methodological approach offers a framework for real-time policy evaluation in contexts with staggered treatment adoption.
-
Journal Article
Understanding Latino Political Engagement and Activity on Social Media
Political Research Quarterly, 2025
-
Working Paper
Web Scraping for Research: Legal, Ethical, Institutional, and Scientific Considerations
Working Paper, December 2024
Scientists across disciplines often use data from the internet to conduct research, generating valuable insights about human behavior. However, as generative AI relying on massive text corpora becomes increasingly valuable, platforms have greatly restricted access to data through official channels. As a result, researchers will likely engage in more web scraping to collect data, introducing new challenges and concerns for researchers. This paper proposes a comprehensive framework for web scraping in social science research for U.S.-based researchers, examining the legal, ethical, institutional, and scientific factors that researchers should consider when scraping the web. We present an overview of the current regulatory environment impacting when and how researchers can access, collect, store, and share data via scraping. We then provide researchers with recommendations to conduct scraping in a scientifically legitimate and ethical manner. We aim to equip researchers with the relevant information to mitigate risks and maximize the impact of their research amidst this evolving data access landscape.
-
Journal Article
Concept-Guided Chain-of-Thought Prompting for Pairwise Comparison Scoring of Texts with Large Language Models
IEEE International Conference on Big Data, 2024
Existing text scoring methods require a large corpus, struggle with short texts, or require hand-labeled data. We develop a text scoring framework that leverages generative large language models (LLMs) to (1) set texts against the backdrop of information from the near-totality of the web and digitized media, and (2) effectively transform pairwise text comparisons from a reasoning problem to a pattern recognition task. Our approach, concept-guided chain-of-thought (CGCoT), utilizes a chain of researcher-designed prompts with an LLM to generate a concept-specific breakdown for each text, akin to guidance provided to human coders. We then pairwise compare breakdowns using an LLM and aggregate answers into a score using a probability model. We apply this approach to better understand speech reflecting aversion to specific political parties on Twitter, a topic that has commanded increasing interest because of its potential contributions to democratic backsliding. We achieve stronger correlations with human judgments than widely used unsupervised text scoring methods like Wordfish. In a supervised setting, besides a small pilot dataset to develop CGCoT prompts, our measures require no additional hand-labeled data and produce predictions on par with RoBERTa-Large fine-tuned on thousands of hand-labeled tweets. This project showcases the potential of combining human expertise and LLMs for scoring tasks.
-
Journal Article
The Diffusion and Reach of (Mis)Information on Facebook During the U.S. 2020 Election
Sociological Science, 2024
Social media creates the possibility for rapid, viral spread of content, but how many posts actually reach millions? And is misinformation special in how it propagates? We answer these questions by analyzing the virality of and exposure to information on Facebook during the U.S. 2020 presidential election. We examine the diffusion trees of the approximately 1 B posts that were re-shared at least once by U.S.-based adults from July 1, 2020, to February 1, 2021. We differentiate misinformation from non-misinformation posts to show that (1) misinformation diffused more slowly, relying on a small number of active users that spread misinformation via long chains of peer-to-peer diffusion that reached millions; non-misinformation spread primarily through one-to-many affordances (mainly, Pages); (2) the relative importance of peer-to-peer spread for misinformation was likely due to an enforcement gap in content moderation policies designed to target mostly Pages and Groups; and (3) periods of aggressive content moderation proximate to the election coincide with dramatic drops in the spread and reach of misinformation and (to a lesser extent) political content.
-
Journal Article
-
Journal Article
News Sharing on Social Media: Mapping the Ideology of News Media, Politicians, and the Mass Public
Political Analysis, 2024
-
Journal Article
The Trump Advantage in Policy Recall Among Voters
American Politics Research, 2024
Research in political science suggests campaigns have a minimal effect on voters’ attitudes and vote choice. We evaluate the effectiveness of the 2016 Trump and Clinton campaigns at informing voters by giving respondents an opportunity to name policy positions of candidates that they felt would make them better off. The relatively high rates of respondents’ ability to name a Trump policy that would make them better off suggests that the success of his campaign can be partly attributed to its ability to communicate memorable information. Our evidence also suggests that cable television informed voters: respondents exposed to higher levels of liberal news were more likely to be able to name Clinton policies, and voters exposed to higher levels of conservative news were more likely to name Trump policies; these effects hold even conditioning on respondents’ ideology and exposure to mainstream media. Our results demonstrate the advantages of using novel survey questions and provide additional insights into the 2016 campaign that challenge one part of the conventional narrative about the presumed non-importance of operational ideology.
-
Journal Article
-
Journal Article
A Multi-Stakeholder Approach for Leveraging Data Portability to Support Research on the Digital Information Environment
Journal of Online Trust and Safety, 2024
In this paper, we aim to situate data portability within the evolving discussions of how to support data access for researchers studying the digital information environment. We explore how data donations, enabled by existing data access rights and data portability requirements, provide promising opportunities for supporting research on critical trust and safety topics. Evaluating other data access mechanisms that are more central to policy debates about platform transparency, we argue that data donations are a powerful additional mechanism that offer key legal, ethical, and scientific benefits. We then assess current challenges with using data donations for research and offer recommendations for various stakeholders to better align portability mechanisms with the needs of research. Taken together, we argue that although portability is often considered within a context of competition and user agency, regulators, industry actors, and researchers should understand and leverage portability’s potential impact to empower critical research on the societal impacts of digital platforms and services.
-
Working Paper
Survey Professionalism: New Evidence from Web Browsing Data
Working Paper, August 2024
Online panels have become an important resource for research in political science, but the financial compensation involved incentivizes respondents to become “survey professionals”, which raises concerns about data quality. We provide evidence on survey professionalism using behavioral web browsing data from three U.S. samples, recruited via Lucid, YouGov, and Facebook (total n = 3,886). Survey professionalism is common but varies across samples: By our most conservative measure, we identify 1.7% of respondents on Facebook, 7.9% of respondents on YouGov, and 34.3% of respondents on Lucid as survey professionals. However, evidence that professionals lower data quality is limited: they do not systematically differ demographically or politically from non-professionals and do not respond more randomly—although they are somewhat more likely to speed, to straightline, and to take questionnaires repeatedly. While concerns are warranted, we conclude that survey professionals do not, by and large, distort inferences of research based on online panels.
-
Working Paper
-
Journal Article
Digital Town Square? Nextdoor's Offline Contexts and Online Discourse
Journal of Quantitative Description: Digital Media, 2024
There is scant quantitative research describing Nextdoor, the world's largest and most important hyperlocal social media network. Due to its localized structure, Nextdoor data are notoriously difficult to collect and work with. We build multiple datasets that allow us to generate descriptive analyses of the platform's offline contexts and online content. We first create a comprehensive dataset of all Nextdoor neighborhoods joined with U.S. Census data, which we analyze at the community-level (block-group). Our findings suggests that Nextdoor is primarily used in communities where the populations are whiter, more educated, more likely to own a home, and with higher levels of average income, potentially impacting the platform's ability to create new opportunities for social capital formation and citizen engagement. At the same time, Nextdoor neighborhoods are more likely to have active government agency accounts---and law enforcement agencies in particular---where offline communities are more urban, have larger nonwhite populations, greater income inequality, and higher average home values. We then build a convenience sample of 30 Nextdoor neighborhoods, for which we collect daily posts and comments appearing in the feed (115,716 posts and 163,903 comments), as well as associated metadata. Among the accounts for which we collected posts and comments, posts seeking or offering services were the most frequent, while those reporting potentially suspicious people or activities received the highest average number of comments. Taken together, our study describes the ecosystem of and discussion on Nextdoor, as well as introduces data for quantitatively studying the platform.
-
Working Paper
Misinformation Exposure Beyond Traditional Feeds: Evidence from a WhatsApp Deactivation Experiment in Brazil
Working Paper, May 2024
In most advanced democracies, concerns about the spread of misinformation are typically associated with feed-based social media platforms like Twitter and Facebook. These platforms also account for the vast majority of research on the topic. However, in most of the world, particularly in Global South countries, misinformation often reaches citizens through social media messaging apps, particularly WhatsApp. To fill the resulting gap in the literature, we conducted a multimedia deactivation experiment to test the impact of reducing exposure to potential sources of misinformation on WhatsApp during the weeks leading up to the 2022 Presidential election in Brazil. We find that this intervention significantly reduced participants’ exposure to false rumors circulating widely during the election. However, consistent with theories of mass media minimal effects, a short-term reduction in exposure to misinformation ahead of the election did not lead to significant changes in belief accuracy, political polarization, or well-being.
-
Journal Article
The Effects of Facebook and Instagram on the 2020 Election: A Deactivation Experiment
Proceedings of the National Academy of Sciences, 2024
We study the effect of Facebook and Instagram access on political beliefs, attitudes, and behavior by randomizing a subset of 19,857 Facebook users and 15,585 Instagram users to deactivate their accounts for 6 wk before the 2020 U.S. election. We report four key findings. First, both Facebook and Instagram deactivation reduced an index of political participation (driven mainly by reduced participation online). Second, Facebook deactivation had no significant effect on an index of knowledge, but secondary analyses suggest that it reduced knowledge of general news while possibly also decreasing belief in misinformation circulating online. Third, Facebook deactivation may have reduced self-reported net votes for Trump, though this effect does not meet our preregistered significance threshold. Finally, the effects of both Facebook and Instagram deactivation on affective and issue polarization, perceived legitimacy of the election, candidate favorability, and voter turnout were all precisely estimated and close to zero.
-
Book
Online Data and the Insurrection
Media and January 6th, 2024
Online data is key to understanding the leadup to the January 6 insurrection, including how and why election fraud conspiracies spread online, how conspiracy groups organized online to participate in the insurrection, and other factors of online life that led to the insurrection. However, there are significant challenges in accessing data for this research. First, platforms restrict which researchers get access to data, as well as what researchers can do with the data they access. Second, this data is ephemeral; that is, once users or the platform remove the data, researchers can no longer access it. These factors affect what research questions can ever be asked and answered.
-
Journal Article
Estimating the Ideology of Political YouTube Videos
Political Analysis, 2024
We present a method for estimating the ideology of political YouTube videos. As online media increasingly influences how people engage with politics, so does the importance of quantifying the ideology of such media for research. The subfield of estimating ideology as a latent variable has often focused on traditional actors such as legislators, while more recent work has used social media data to estimate the ideology of ordinary users, political elites, and media sources. We build on this work by developing a method to estimate the ideologies of YouTube videos, an important subset of media, based on their accompanying text metadata. First, we take Reddit posts linking to YouTube videos and use correspondence analysis to place those videos in an ideological space. We then train a text-based model with those estimated ideologies as training labels, enabling us to estimate the ideologies of videos not posted on Reddit. These predicted ideologies are then validated against human labels. Finally, we demonstrate the utility of this method by applying it to the watch histories of survey respondents with self-identified ideologies to evaluate the prevalence of echo chambers on YouTube. Our approach gives video-level scores based only on supplied text metadata, is scalable, and can be easily adjusted to account for changes in the ideological climate. This method could also be generalized to estimate the ideology of other items referenced or posted on Reddit.
-
Journal Article
Online Searches to Evaluate Misinformation Can Increase its Perceived Veracity
Nature, 2024
Considerable scholarly attention has been paid to understanding belief in online misinformation, with a particular focus on social networks. However, the dominant role of search engines in the information environment remains underexplored, even though the use of online search to evaluate the veracity of information is a central component of media literacy interventions. Although conventional wisdom suggests that searching online when evaluating misinformation would reduce belief in it, there is little empirical evidence to evaluate this claim. Here, across five experiments, we present consistent evidence that online search to evaluate the truthfulness of false news articles actually increases the probability of believing them. To shed light on this relationship, we combine survey data with digital trace data collected using a custom browser extension. We find that the search effect is concentrated among individuals for whom search engines return lower-quality information. Our results indicate that those who search online to evaluate misinformation risk falling into data voids, or informational spaces in which there is corroborating evidence from low-quality sources. We also find consistent evidence that searching online to evaluate news increases belief in true news from low-quality sources, but inconsistent evidence that it increases belief in true news from mainstream sources. Our findings highlight the need for media literacy programmes to ground their recommendations in empirically tested strategies and for search engines to invest in solutions to the challenges identified here.