Research
CSMaP is a leading academic research institute studying the ever-shifting online environment at scale. We publish peer-reviewed research in top academic journals, produce rigorous reports and analyses on policy relevant topics, and develop open source tools and methods to support the broader scholarly community.
Academic Research
-
Working Paper
Polarization by Default: Auditing Recommendation Bias in LLM-Based Content Curation
Working Paper, 2026
Large Language Models (LLMs) are increasingly deployed to curate and rank human-created content, yet the nature and structure of their biases in these tasks remains poorly understood: which biases are robust across providers and platforms, and which can be mitigated through prompt design. We present a controlled simulation study mapping content selection biases across three major LLM providers (OpenAI, Anthropic, Google) on real social media datasets from Twitter/X, Bluesky, and Reddit, using six prompting strategies (\textit{general}, \textit{popular}, \textit{engaging}, \textit{informative}, \textit{controversial}, \textit{neutral}). Through 540,000 simulated top-10 selections from pools of 100 posts across 54 experimental conditions, we find that biases differ substantially in how structural and how prompt-sensitive they are. Polarization is amplified across all configurations, toxicity handling shows a strong inversion between engagement- and information-focused prompts, and sentiment biases are predominantly negative. Provider comparisons reveal distinct trade-offs: GPT-4o Mini shows the most consistent behavior across prompts; Claude and Gemini exhibit high adaptivity in toxicity handling; Gemini shows the strongest negative sentiment preference. On Twitter/X, where author demographics can be inferred from profile bios, political leaning bias is the clearest demographic signal: left-leaning authors are systematically over-represented despite right-leaning authors forming the pool plurality in the dataset, and this pattern largely persists across prompts.
-
Journal Article
How deceptive online networks reached millions in the US 2020 elections
Nature Human Behaviour (2026)
Deceptive online networks are coordinated efforts that use identity deception to pursue strategic political or financial goals. During the US 2020 elections, these networks reached at least 37 million Facebook and 3 million Instagram users, representing 15% and 2% of the platforms’ active US adult users, respectively. Only 3 networks out of 49—1 network with explicitly political aims and 2 that appeared to use politics as a lure for profit—were responsible for over 70% of users reached. Notably, accounts unaffiliated with the networks played an important role in facilitating this reach by resharing content the three networks produced. Deceptive networks, regardless of whether their goals were political or financial, reached users who were older, more conservative, more frequently exposed to content from untrustworthy sources, and spent more time on Facebook.
Reports & Analysis
-
Report
Research Coordination Network: Democracy in the Networked Era
The Digital Information Environment & Global Elections
September 23, 2025
-
Analysis
Who Has a Policy that Would Benefit You? More Voters Say Trump.
National survey data from the 2016, 2020, and 2024 elections shed light on how candidates' campaign strategies impact voter policy recall.
November 2, 2024
Data Collections & Tools
As part of our project to construct comprehensive data sets and to empirically test hypotheses related to social media and politics, we have developed a suite of open-source tools and modeling processes.