Academic Research

CSMAP faculty, postdoctoral fellows, and students publish rigorous, peer-reviewed research in top academic journals and post working papers sharing ongoing work.

Search or Filter

  • Journal Article

    State Media Control Influences Large Language Models

    Nature, 2026

    View Article View abstract

    Millions of people around the world query large language models (LLMs) for information. Although several studies have compellingly documented the persuasive potential of these models, there is limited evidence of who or what influences the models themselves, leading to a flurry of concerns about which companies and governments build and regulate the models. Here we show through six studies that government control of the media across the world already influences the output of LLMs via their training data. We use a cross-national audit to show that LLMs exhibit a stronger pro-government valence when prompted in the languages of countries with lower media freedom than in those with higher media freedom. This result is correlational, so to triangulate the specific mechanism of how state media control can influence LLMs, we develop a multi-part case study on China’s media. We demonstrate that media scripted and curated by the Chinese state appears in LLM training datasets. To evaluate the plausible effect of this inclusion, we use an open-weight model to show that additional pretraining on Chinese state-coordinated media generates more positive answers to prompts about Chinese political institutions and leaders. We link this phenomenon to commercial models through two audit studies demonstrating that prompting models in Chinese generates more positive responses about China’s institutions and leaders than do the same queries in English. The combination of influence and persuasive potential across languages suggests the troubling conclusion that states and powerful institutions have increased strategic incentives to leverage media control in the hopes of shaping LLM output.

  • Working Paper

    Artificial Intelligence, Politics, and Political Science

    Working Paper, 2026

    View Article View abstract

    This forthcoming edited volume (Cambridge University Press) examines the transformative impact of artificial intelligence on democratic institutions, political behavior, governance, and the discipline of political science itself. The volume represents the report of the American Political Science Association’s Presidential Task Force on AI, Politics, and Political Science, co-chaired by Joshua Tucker and Nathaniel Persily. 

    Across twelve chapters produced by close to 60 scholars, the report evaluates how generative AI and machine learning systems are reshaping public opinion formation, political communication, labor markets, electoral processes, state capacity, and regulatory frameworks. The authors analyze both the opportunities and risks posed by AI technologies, including concerns surrounding information integrity, ideological personalization, surveillance, democratic accountability, and concentrated technological power. Themes that cut across multiple chapters include: the unprecedented power of a small number of AI corporations; the opacity and non-replicability of model outputs; bias in AI systems; and the absence of agreed-upon benchmarks for evaluation.The volume also addresses methodological and ethical implications for political science research, emphasizing transparency, reproducibility, and the responsible integration of AI tools into scholarly inquiry. Ultimately, the volume argues that AI will not only alter political institutions and citizen-state relations, but also may fundamentally reshape how political knowledge is produced and interpreted. It calls for sustained interdisciplinary collaboration and evidence-based governance to ensure that AI development supports democratic resilience rather than undermining it.

  • Working Paper

    Polarization by Default: Auditing Recommendation Bias in LLM-Based Content Curation

    Working Paper, 2026

    View Article View abstract

    Large Language Models (LLMs) are increasingly deployed to curate and rank human-created content, yet the nature and structure of their biases in these tasks remains poorly understood: which biases are robust across providers and platforms, and which can be mitigated through prompt design. We present a controlled simulation study mapping content selection biases across three major LLM providers (OpenAI, Anthropic, Google) on real social media datasets from Twitter/X, Bluesky, and Reddit, using six prompting strategies (\textit{general}, \textit{popular}, \textit{engaging}, \textit{informative}, \textit{controversial}, \textit{neutral}). Through 540,000 simulated top-10 selections from pools of 100 posts across 54 experimental conditions, we find that biases differ substantially in how structural and how prompt-sensitive they are. Polarization is amplified across all configurations, toxicity handling shows a strong inversion between engagement- and information-focused prompts, and sentiment biases are predominantly negative. Provider comparisons reveal distinct trade-offs: GPT-4o Mini shows the most consistent behavior across prompts; Claude and Gemini exhibit high adaptivity in toxicity handling; Gemini shows the strongest negative sentiment preference. On Twitter/X, where author demographics can be inferred from profile bios, political leaning bias is the clearest demographic signal: left-leaning authors are systematically over-represented despite right-leaning authors forming the pool plurality in the dataset, and this pattern largely persists across prompts.

  • Journal Article

    How Deceptive Online Networks Reached Millions in the US 2020 elections

    • Ruth E. Appel, 
    • Young Mie Kim, 
    • Jennifer Pan, 
    • Yiqing Xu, 
    • Ben Nimmo, 
    • Daniel Robert Thomas, 
    • Hunt Allcott, 
    • Pablo Barberá
    • Taylor Brown, 
    • Adriana Crespo-Tenorio, 
    • Drew Dimmery, 
    • Deen Freelon, 
    • Matthew Gentzkow, 
    • Sandra González-Bailón
    • Andrew M. Guess
    • Shanto Iyengar, 
    • David Lazer, 
    • Neil Malhotra, 
    • Devra Moehler, 
    • Brendan Nyhan, 
    • Jaime Settle, 
    • Emily Thorson, 
    • Rebekah Tromble, 
    • Caros Velasco Rivera, 
    • Arjun Wilkins, 
    • Magdalena Wojcieszak
    • Beixian Xiong, 
    • Chad Kiewiet de Jonge, 
    • Annie Franco, 
    • Winter Mason, 
    • Natalie Jomini Stroud, 
    • Joshua A. Tucker

    Nature Human Behaviour, 2026

    View Article View abstract

    Deceptive online networks are coordinated efforts that use identity deception to pursue strategic political or financial goals. During the US 2020 elections, these networks reached at least 37 million Facebook and 3 million Instagram users, representing 15% and 2% of the platforms’ active US adult users, respectively. Only 3 networks out of 49—1 network with explicitly political aims and 2 that appeared to use politics as a lure for profit—were responsible for over 70% of users reached. Notably, accounts unaffiliated with the networks played an important role in facilitating this reach by resharing content the three networks produced. Deceptive networks, regardless of whether their goals were political or financial, reached users who were older, more conservative, more frequently exposed to content from untrustworthy sources, and spent more time on Facebook.

  • Working Paper

    AI summaries in social media improve dialogue but reduce engagement

    • Michael Heseltine, 
    • Christopher A. Bail, 
    • Petter Tornberg, 
    • Michelle Schimmel, 
    • Christopher Barrie

    Working Paper, 2026

    View Article View abstract

    Generative artificial intelligence agents are becoming increasingly active participants in conversations on social media platforms, yet little is known about how they shape public discussion of social problems. We present two preregistered online experiments testing AI-generated summaries in simulated, interactive social media environments. AI summaries increased the quality of user comments, without systematically increasing toxicity or negative affect. At the same time, AI exposure reduced engagement with conversation threads. AI summaries also increased the semantic similarity between user comments and the AI-generated summaries, suggesting that these systems function as informational anchors that shape discussion. Together, the findings reveal a tradeoff: AI-generated summaries can improve conversation quality while narrowing conversational engagement and channeling how users articulate political arguments. These results speak to growing concerns about how embedded AI systems fundamentally alter platform dynamics and shape public discourse.

  • Working Paper

    The Partisan Effects of Social Media Bans

    Working Paper, March 2026

    View Article View abstract

    What happens to information environments when democracies ban social media platforms? While a large literature examines information control under authoritarianism, democratic governments have increasingly intervened in major online platforms. We study a prominent case: Brazil’s 2024 national ban on the social media platform X. Using an event-study design, we estimate the causal effects of the ban and examine how partisan identity shaped responses. Drawing on a large sample of politically engaged users and ideal-point estimates of ideology, we find strong partisan asymmetries. Conservative users not aligned with the government were more likely to circumvent the ban, and right-leaning news domains became markedly more prevalent on the platform. We describe this dynamic as a “sorting ratchet”: the ban segmented the digital public sphere along partisan lines, with effects that persisted even after restrictions were lifted. Platform bans in democratic settings may therefore deepen polarization and durably reshape information environments

  • Working Paper

    Synthetic personas distort the structure of human belief systems

    Working Paper, 2026

    View Article View abstract

    Large language models (LLMs) are increasingly used as synthetic survey respondents, yet it is unclear whether their belief-system structure matches that of real publics. We compare 28 LLMs to the 2024 General Social Survey (GSS) using 52 attitude items and demographic persona traits. We estimate polychoric correlation matrices and propagate un-certainty in the GSS via bootstrap resampling with multiple imputation. Constraint is measured by the variance share explained by the first principal component and by effective dependence, a determinant-based measure of global linear dependence. Across models, LLM personas exhibit substantially higher constraint than humans; conditioning on persona traits reduces constraint far more for LLMs, indicating greater demographic mediation. Projection onto a shared GSS basis further shows overemphasis of the leading dimension and missing secondary structure. These results caution against treating LLM personas as a reliable foundation for synthetic survey data generation.

  • Journal Article

    Age Verification and Public Adaptation: A Pre-Registered Synthetic Control Multiverse

    • David Lang, 
    • Benjamin Listyg, 
    • Brennah V. Ross, 
    • Anna Vinals Musquera, 
    • Zeve Sanderson

    Journal of Law and Empirical Analysis, 2026

    View Article View abstract

    Starting in January 2023, Louisiana and more than 20 other states passed laws requiring age verification for websites with substantial adult content. Using Google Trends data and a synthetic control design, we examine how these laws affect the public’s digital behavior across four dimensions: searches for compliant websites, non-compliant websites, VPNs, and adult content. Three months after the laws were passed, results show a 51% decrease in searches for the main compliant platform, while searches increased for both non-compliant platform (48.1%) and VPN services (23.6%). Through multiverse analyses, we demonstrate the robustness of these findings to numerous model specifications. Our findings reveal that while regulations reduce traffic to compliant sites and likely decrease overall consumption, users adapt by shifting to providers without verification requirements. This approach provides valuable insights for policymakers around the world considering similar legislative measures of digital content regulation. Our methodology also offers a framework for real-time policy evaluation in contexts with staggered implementation.

    Date Posted

    Jan 13, 2026

  • Working Paper

    Testing the Casual Impact of Social Media Reduction Around the Globe

    Working Paper, December 2025

    View Article View abstract

    More than half of the world’s population uses social media. There is widespread debate among the public, politicians, and academics about social media’s impact on important outcomes, such as intergroup conflict and well-being. However, most prior research on the impact of social media relies on samples from the United States and Western Europe, despite emerging evidence suggesting that the impact of social media is likely to differ across the globe. Building on the results of pilot experiments from three countries (n = 894), we plan to conduct a global field experiment to measure the causal impact of reducing social media usage for two weeks across 23 countries (projected n > 8,000). We will then test how social media reduction influences four main outcomes: news knowledge, exposure to online hostility, intergroup attitudes, and well-being. We will also explore how the effects of social media reduction vary across world regions, focusing on three theoretically-informed country-level moderators: levels of income, inequality, and democracy. This large-scale, high-powered field experiment, and the global dataset resulting from it, will offer rare causal evidence to inform ongoing debates about the impact of social media and how it varies around the world.

  • Journal Article

    An enriched, multimodal social media dataset of a UK General Election campaign

    Journal of Quantitative Description: Digital Media, 2025

    View Article View abstract

    This article introduces a dataset of all posts by candidates during the 2024 General Election in the United Kingdom with a presence on the X (formerly Twitter) platform. The article relies on a crowd-sourcing innovation in the United Kingdom that, for the first time, provided researchers with early access to a regularly updated candidate list prior to the start of the election. This made it possible to collect real-time data on candidate posts for 1,604 candidates across 53 separate political parties. Additionally, we download and store 53,327 images and 15,982 videos posted within tweets. We enrich the data with the realized vote count and vote share for each candidate as well as text transcripts extracted from the audio of video posts. Overall, the dataset provides a uniquely comprehensive collection of online campaigning material for an election campaign and will be of considerable value to scholars of political communication, elections, and democratic responsiveness. We also analyze the topics and tone — focusing on negativity — across different media formats to identify patterns in the content and style of candidate communication across parties.

  • Journal Article

    Survey Professionalism: New Evidence from Web Browsing Data

    Political Analysis, 2025

    View Article View abstract

    Online panels have become an important resource for research in political science, but the compensation offered to panelists incentivizes them to become “survey professionals,” raising concerns about data quality. We provide evidence on survey professionalism exploring three US samples of subjects who donated their browsing data, recruited via Lucid, YouGov, and Facebook (total  𝑛=3,886). Survey professionalism is common, but varies across samples: by our most conservative estimate, we find 1.7% of respondents on Facebook, 7.6% on YouGov, and 34 7% on Lucid to be professionals (under the assumption that professionals are as likely as non-professionals to donate data after conditioning on observable demographics available from all online survey takers). However, evidence that professionals lower data quality is limited: they do not systematically differ demographically or politically from non-professionals and do not exhibit more response instability. They are, however, somewhat more likely to speed, straightline, and attempt to take questionnaires repeatedly. To address potential selection issues in donating of browsing data, we present sensitivity analyses with lower bounds for survey professionalism. While concerns about professionalism are warranted, we conclude that survey professionals do not, by and large, distort inferences of research based on online panels.

    Date Posted

    Oct 06, 2025

  • Journal Article

    How Language Shapes Belief in Misinformation: A Study Among Multilinguals in Ukraine

    Journal of Experimental Political Science, 2025

    View Article View abstract

    Scholarship has identified key determinants of people’s belief in misinformation predominantly from English-language contexts. However, multilingual citizens often consume news media in multiple languages. We study how the language of consumption affects belief in misinformation and true news articles in multilingual environments. We suggest that language may pass on specific cues affecting how bilinguals evaluate information. In a ten-week survey experiment with bilingual adults in Ukraine, we measured if subjects evaluating information in their less-preferred language were less likely to believe it. We find those who prefer Ukrainian are less likely to believe both false and true stories written in Russian by approximately 0.2 standard deviation units. Conversely, those who prefer Russian show increased belief in false stories in Ukrainian, though this effect is less robust. A secondary digital media literacy intervention does not increase discernment as it reduces belief in both true and false stories equally.

    Date Posted

    Aug 26, 2025

    Tags

  • Journal Article

    Quantifying Narrative Similarity Across Languages

    Sociological Methods & Research, 2025

    View Article View abstract

    How can one understand the spread of ideas across text data? This is a key measurement problem in sociological inquiry, from the study of how interest groups shape media discourse, to the spread of policy across institutions, to the diffusion of organizational structures and institution themselves. To study how ideas and narratives diffuse across text, we must first develop a method to identify whether texts share the same information and narratives, rather than the same broad themes or exact features. We propose a novel approach to measure this quantity of interest, which we call “narrative similarity,” by using large language models to distill texts to their core ideas and then compare the similarity of claims rather than of words, phrases, or sentences. The result is an estimand much closer to narrative similarity than what is possible with past relevant alternatives, including exact text reuse, which returns lexically similar documents; topic modeling, which returns topically similar documents; or an array of alternative approaches. We devise an approach to providing out-of-sample measures of performance (precision, recall, F1) and show that our approach outperforms relevant alternatives by a large margin. We apply our approach to an important case study: The spread of Russian claims about the development of a Ukrainian bioweapons program in U.S. mainstream and fringe news websites. While we focus on news in this application, our approach can be applied more broadly to the study of propaganda, misinformation, diffusion of policy and cultural objects, among other topics.

  • Journal Article

    Labeling Social Media Posts: Does Showing Coders Multimodal Content Produce Better Human Annotation, and a Better Machine Classifier?

    Political Science Research and Methods, 2025

    View Article View abstract

    The increasing multimodality (e.g., images, videos, links) of social media data presents opportunities and challenges. But text-as-data methods continue to dominate as modes of classification, as multimodal social media data are costly to collect and label. Researchers who face a budget constraint may need to make informed decisions regarding whether to collect and label only the textual content of social media data or their full multimodal content. In this article, we develop five measures and an experimental framework to assist with these decisions. We propose five performance metrics to measure the costs and benefits of multimodal labeling: average time per post, average time per valid response, valid response rate, intercoder agreement, and classifier’s predictive power. To estimate these measures, we introduce an experimental framework to evaluate coders’ performance under text-only and multimodal labeling conditions. We illustrate the method with a tweet labeling experiment.

  • Working Paper

    The Effect of Deactivating Facebook and Instagram on Users’ Emotional State

    • Hunt Allcott, 
    • Matthew Gentzkow, 
    • Benjamin Wittenbrink, 
    • Juan Carlos Cisneros, 
    • Adriana Crespo-Tenorio, 
    • Drew Dimmery, 
    • Deen Freelon, 
    • Sandra González-Bailón
    • Andrew M. Guess
    • Young Mie Kim, 
    • David Lazer, 
    • Neil Malhotra, 
    • Devra Moehler, 
    • Sameer Nair-Desai, 
    • Brendan Nyhan, 
    • Jennifer Pan, 
    • Jaime Settle, 
    • Emily Thorson, 
    • Rebekah Tromble, 
    • Carlos Velasco Rivera, 
    • Arjun Wilkins, 
    • Magdalena Wojcieszak
    • Annie Franco, 
    • Chad Kiewiet de Jonge, 
    • Winter Mason, 
    • Natalie Jomini Stroud, 
    • Joshua A. Tucker

    Working Paper, April 2025

    View Article View abstract

    We estimate the effect of social media deactivation on users’ emotional state in two large randomized experiments before the 2020 U.S. election. People who deactivated Facebook for the six weeks before the election reported a 0.060 standard deviation improvement in an index of happiness, depression, and anxiety, relative to controls who deactivated for just the first of those six weeks. People who deactivated Instagram for those six weeks reported a 0.041 standard deviation improvement relative to controls. Exploratory analysis suggests the Facebook effect is driven by people over 35, while the Instagram effect is driven by women under 25.

  • Working Paper

    The Effects of Political Advertising on Facebook and Instagram Before the 2020 US Election

    • Hunt Allcott, 
    • Matthew Gentzkow, 
    • Ro’ee Levy, 
    • Adriana Crespo-Tenorio, 
    • Natasha Dumas, 
    • Winter Mason, 
    • Devra Moehler, 
    • Pablo Barberá
    • Taylor Brown, 
    • Juan Carlos Cisneros, 
    • Drew Dimmery, 
    • Deen Freelon, 
    • Sandra González-Bailón
    • Andrew M. Guess
    • Young Mie Kim, 
    • David Lazer, 
    • Neil Malhotra, 
    • Sameer Nair-Desai, 
    • Brendan Nyhan, 
    • Ana Carolina Paixao de Queiroz, 
    • Jennifer Pan, 
    • Jaime Settle, 
    • Emily Thorson, 
    • Rebekah Tromble, 
    • Carlos Velasco Rivera, 
    • Benjamin Wittenbrink, 
    • Magdalena Wojcieszak
    • Shiqi Yang, 
    • Saam Zahedian, 
    • Annie Franco, 
    • Chad Kiewiet de Jonge, 
    • Natalie Jomini Stroud, 
    • Joshua A. Tucker

    Working Paper, May 2025

    View Article View abstract

    We study the effects of social media political advertising by randomizing subsets of 36,906 Facebook users and 25,925 Instagram users to have political ads removed from their news feeds for six weeks before the 2020 US presidential election. We show that most presidential ads were targeted toward parties’ own supporters and that fundraising ads were most common. On both Facebook and Instagram, we found no detectable effects of removing political ads on political knowledge, polarization, perceived legitimacy of the election, political participation (including campaign contributions), candidate favorability, and turnout. This was true overall and for both Democrats and Republicans separately.

  • Journal Article

    Misinformation Beyond Traditional Feeds: Evidence from a WhatsApp Deactivation Experiment in Brazil

    The Journal of Politics, 2025

    View Article View abstract

    In most advanced democracies, concerns about the spread of misinformation are typically associated with feed-based social media platforms like Twitter and Facebook. These platforms also account for the vast majority of research on the topic. However, in most of the world, particularly in Global South countries, misinformation often reaches citizens through social media messaging apps, particularly WhatsApp. To fill the resulting gap in the literature, we conducted a multimedia deactivation experiment to test the impact of reducing exposure to potential sources of misinformation on WhatsApp during the weeks leading up to the 2022 Presidential election in Brazil. We find that this intervention significantly reduced participants’ recall of false rumors circulating widely during the election. However, consistent with theories of mass media minimal effects, a short-term change in the information environment did not lead to significant changes in belief accuracy, political polarization, or well-being.

  • Working Paper

    Emergent LLM Behaviors are Observationally Equivalent to Data Leakage

    Working Paper, 2025

    View Article View abstract

    Ashery et al. recently argue that large language models (LLMs), when paired to play a classic "naming game," spontaneously develop linguistic conventions reminiscent of human social norms. Here, we show that their results are better explained by data leakage: the models simply reproduce conventions they already encountered during pre-training. Despite the authors' mitigation measures, we provide multiple analyses demonstrating that the LLMs recognize the structure of the coordination game and recall its outcomes, rather than exhibit "emergent" conventions. Consequently, the observed behaviors are indistinguishable from memorization of the training corpus. We conclude by pointing to potential alternative strategies and reflecting more generally on the place of LLMs for social science models.

    Area of Study

    Date Posted

    May 25, 2025

  • Journal Article

    Bottom Up? Top Down? Determinants of Issue-Attention in State Politics

    The Journal of Politics, 2025

    View Article View abstract

    Who shapes the issue-attention cycle of state legislators? Although state governments make critical policy decisions, data and methodological constraints have limited researchers’ ability to study state-level agenda setting. For this paper, we collect more than 122 million Twitter messages sent by state and national actors in 2018 and 2021. We then employ supervised machine learning and time series techniques to study how the issue-attention of state lawmakers evolves vis-à-vis various local- and national-level actors. Our findings suggest that state legislators operate at the confluence of national and local influences. In line with arguments highlighting the nationalization of state politics, we find that state legislators are consistently responsive to policy debates among members of Congress. However, despite growing nationalization concerns, we also find strong evidence of issue responsiveness by legislators to members of the public in their states and moderate responsiveness to regional media sources.

    Date Posted

    Mar 28, 2025

  • Working Paper

    Information Diets are More Diverse in Attention than in Engagement

    Working Paper, 2025

    View Article View abstract

    What political content do we pay attention to online? Diverse political information is essential for democratic competence, yet online media raises concerns about fragmented information diets. Research on selective exposure highlights how social media can foster ideological echo chambers, while other studies emphasize incidental exposure to diverse viewpoints. A critical limitation is measurement: existing research primarily uses engagement metrics (e.g., likes or shares), neglecting passive exposure or attention—what users notice but do not interact with. In this study, we address this gap through an experimental platform that separately records attention and engagement. Our findings indicate that the ideology-engagement association is about seven times the magnitude of the ideology-attention association. This underscores the importance of measuring attention, rather than solely engagement, to accurately assess the diversity of online information diets.

    Date Posted

    Mar 22, 2025

  • Journal Article

    To Moderate, or Not to Moderate: Strategic Domain Sharing by Congressional Campaigns

    Electoral Studies, 2025

    View Article View abstract

    We test whether candidates move to the extremes before a primary but then return to the center for the general election to appeal to the different preferences of each electorate. Incumbents are now more vulnerable to primary challenges than ever as social media offers a viable pathway for fundraising and messaging for challengers, while homogeneity of districts has reduced general election competitiveness. To assess candidates’ ideological trajectories, we estimate the messaging ideology of 2020 congressional campaigns before and after their primaries using a homophily-based measure of domains shared on Twitter. This method provides temporally granular data to observe changes in communication within a single election campaign cycle. We find suggestive evidence that incumbents in safe seats moved towards the extreme before their primaries and back towards the center for the general election, but only when threatened by a well-funded primary challenge.

    Date Posted

    Mar 17, 2025

  • Working Paper

    Do Age-Verification Bills Change Search Behavior? A Pre-Registered Synthetic Control Multiverse

    • David Lang, 
    • Benjamin Listyg, 
    • Brennah V. Ross, 
    • Anna Vinals Musquera, 
    • Zeve Sanderson

    Working Paper, March 2025

    View Article View abstract

    In January 2023, Louisiana enacted Act 440, requiring websites containing substantial adult content to verify users’ ages through government-issued identification or commercial verification services. Since the passing of this legislation, 17 additional states have adopted similar laws. Using Google Trends data and a preregistered synthetic control design, this paper examines the impact of these age verification requirements on digital behavior across four key dimensions: searches for the largest compliant website, the largest non-compliant website, VPN services, and adult content generally.Three months after the laws were passed, Our analysis reveals a 51% reduction in searches for the dominant compliant platform, accompanied by significant increases in searches for both the dominant non-compliant platform (48.1%) and VPN services (23.6%). Through multiverse analyses that incorporate multiple specifications and control group constructions, we demonstrate the robustness of these behavioral changes. Our point estimates remain consistent with our pre-registered hypotheses across 3,200 point estimates. Our findings highlight that while these regulation efforts reduce traffic to compliant firms and likely a net reduction overall to this type of content, individuals adapt primarily by moving to content providers that do not require age verification. Our methodological approach offers a framework for real-time policy evaluation in contexts with staggered treatment adoption.

    Area of Study

    Date Posted

    Mar 03, 2025

    Tags

  • Journal Article

    Understanding Latino Political Engagement and Activity on Social Media

    Political Research Quarterly, 2025

    View Article View abstract

    Social media is used by millions of Americans to access news and politics. Yet there are no studies, to date, examining whether these behaviors systematically vary for those whose political incorporation process is distinct from those in the majority. We fill this void by examining how Latino online political activity compares to that of white Americans and the role of language in Latinos’ online political engagement. We hypothesize that Latino online political activity is comparable to whites. Moreover, given media reports suggesting that greater quantities of political misinformation are circulating on Spanish versus English-language social media, we expect reliance on Spanish-language social media for news predicts beliefs in inaccurate political narratives. Our survey findings, which we believe to be the largest original survey of the online political activity of Latinos and whites, reveal support for these expectations. Latino social media political activity, as measured by sharing/viewing news, talking about politics, and following politicians, is comparable to whites, both in self-reported and digital trace data. Latinos also turned to social media for news about COVID-19 more often than did whites. Finally, Latinos relying on Spanish-language social media usage for news predicts beliefs in election fraud in the 2020 U.S. Presidential election.

  • Working Paper

    Web Scraping for Research: Legal, Ethical, Institutional, and Scientific Considerations

    Working Paper, December 2024

    View Article View abstract

    Area of Study

    Date Posted

    Dec 19, 2024

  • Journal Article

    Concept-Guided Chain-of-Thought Prompting for Pairwise Comparison Scoring of Texts with Large Language Models

    IEEE International Conference on Big Data, 2024

    View Article View abstract

    Existing text scoring methods require a large corpus, struggle with short texts, or require hand-labeled data. We develop a text scoring framework that leverages generative large language models (LLMs) to (1) set texts against the backdrop of information from the near-totality of the web and digitized media, and (2) effectively transform pairwise text comparisons from a reasoning problem to a pattern recognition task. Our approach, concept-guided chain-of-thought (CGCoT), utilizes a chain of researcher-designed prompts with an LLM to generate a concept-specific breakdown for each text, akin to guidance provided to human coders. We then pairwise compare breakdowns using an LLM and aggregate answers into a score using a probability model. We apply this approach to better understand speech reflecting aversion to specific political parties on Twitter, a topic that has commanded increasing interest because of its potential contributions to democratic backsliding. We achieve stronger correlations with human judgments than widely used unsupervised text scoring methods like Wordfish. In a supervised setting, besides a small pilot dataset to develop CGCoT prompts, our measures require no additional hand-labeled data and produce predictions on par with RoBERTa-Large fine-tuned on thousands of hand-labeled tweets. This project showcases the potential of combining human expertise and LLMs for scoring tasks.

    Date Posted

    Dec 15, 2024