Academic Research
CSMaP is a leading academic research institute studying the ever-shifting online environment at scale. We publish peer-reviewed research in top academic journals and produce rigorous data reports on policy relevant topics.
Search or Filter
-
Working Paper
WhatsApp Increases Exposure to False Rumors but has Limited Effects on Beliefs and Polarization: Evidence from a Multimedia-Constrained Deactivation.
Working Paper, May 2023
For years WhatsApp has been the primary social media application in many countries of the Global South. Numerous journalistic and scholarly accounts suggest that the platform has become a fertile ground for spreading misinformation and partisan content, with some going so far as to assert that WhatsApp could seriously impact electoral outcomes, episodes of violence, and vaccine hesitancy around the world. However, no studies so far have been able to show causal links between WhatsApp usage and these alleged changes in citizens' attitudes and behaviors. To fill this gap, we conducted a field experiment that reduced users' WhatsApp activity during weeks ahead of the most recent Brazilian Presidential election. Our field experiment randomly assigns users to a multimedia deactivation, in which participants turn off their automatic download of any multimedia - image, video, or audio - on WhatsApp and are incentivized not to access any multimedia content during the weeks leading up to the election on October 2, 2022. We find that the deactivation significantly reduced subjects’ exposure to false rumors that circulated widely during the weeks before the election. However, consistent with the minimal-effects tradition, the direct consequences of reducing exposure to misinformation on WhatsApp in the weeks before the election are limited and do not lead to significant changes in belief accuracy and political polarization. Our study expands the growing literature on the causal effects of reducing social media usage on political attitudes by focusing on the role of exposure to misinformation in the Global South.
-
Working Paper
Large Language Models Can Be Used to Scale the Ideologies of Politicians in a Zero-Shot Learning Setting
Working Paper, March 2023
The aggregation of knowledge embedded in large language models (LLMs) holds the promise of new solutions to problems of observability and measurement in the social sciences. We examine this potential in a challenging setting: measuring latent ideology — crucial for better understanding core political functions such as democratic representation. We scale pairwise liberal-conservative comparisons between members of the 116th U.S. Senate using prompts made to ChatGPT. Our measure strongly correlates with widely used liberal-conservative scales such as DW-NOMINATE. Our scale also has interpretative advantages, such as not placing senators who vote against their party for ideologically extreme reasons towards the middle. Our measure is more strongly associated with political activists' perceptions of senators than other measures, consistent with LLMs synthesizing vast amounts of politically relevant data from internet/book corpora rather than memorizing existing measures. LLMs will likely open new avenues for measuring latent constructs utilizing modeled information from massive text corpora.
-
Working Paper
Social Media, Information, and Politics: Insights on Latinos in the U.S.
Working Paper, November 2022
-
Working Paper
Evaluating Expectations from Social and Behavioral Science about COVID-19 and Lessons for the Next Pandemic
Working Paper, October 2022
-
Working Paper
Echo Chambers, Rabbit Holes, and Algorithmic Bias: How YouTube Recommends Content to Real Users
Working Paper, May 2022
To what extent does the YouTube recommendation algorithm push users into echo chambers, ideologically biased content, or rabbit holes? Despite growing popular concern, recent work suggests that the recommendation algorithm is not pushing users into these echo chambers. However, existing research relies heavily on the use of anonymous data collection that does not account for the personalized nature of the recommendation algorithm. We asked a sample of real users to install a browser extension that downloaded the list of videos they were recommended. We instructed these users to start on an assigned video and then click through 20 sets of recommendations, capturing what they were being shown in real time as they used the platform logged into their real accounts. Using a novel method to estimate the ideology of a YouTube video, we demonstrate that the YouTube recommendation algorithm does, in fact, push real users into mild ideological echo chambers where, by the end of the data collection task, liberals and conservatives received different distributions of recommendations from each other, though this difference is small. While we find evidence that this difference increases the longer the user followed the recommendation algorithm, we do not find evidence that many go down `rabbit holes' that lead them to ideologically extreme content. Finally, we find that YouTube pushes all users, regardless of ideology, towards moderately conservative and an increasingly narrow range of ideological content the longer they follow YouTube's recommendations.
-
Working Paper
Estimating the Ideology of Political YouTube Videos
Working Paper, May 2022
We present a method for estimating the ideology of political YouTube videos. As online media increasingly influences how people engage with politics, so does the importance of quantifying the ideology of such media for research. The subfield of estimating ideology as a latent variable has often focused on traditional actors such as legislators, while more recent work has used social media data to estimate the ideology of ordinary users, political elites, and media sources. We build on this work by developing a method to estimate the ideologies of YouTube videos, an important subset of media, based on their accompanying text metadata. First, we take Reddit posts linking to YouTube videos and use correspondence analysis to place those videos in an ideological space. We then train a text-based model with those estimated ideologies as training labels, enabling us to estimate the ideologies of videos not posted on Reddit. These predicted ideologies are then validated against human labels. Finally, we demonstrate the utility of this method by applying it to the watch histories of survey respondents with self-identified ideologies to evaluate the prevalence of echo chambers on YouTube. Our approach gives video-level scores based only on supplied text metadata, is scalable, and can be easily adjusted to account for changes in the ideological climate. This method could also be generalized to estimate the ideology of other items referenced or posted on Reddit.
-
Working Paper
To Moderate, Or Not to Moderate: Strategic Domain Sharing by Congressional Campaigns
Working Paper, April 2022
We test whether candidates move to the extremes before a primary but then return to the center for the general election to appeal to the different preferences of each electorate. Incumbents are now more vulnerable to primary challenges than ever as social media offers a viable pathway for fundraising and messaging to challengers, while homogeneity of districts has reduced general election competitiveness. To assess candidates' ideological trajectories, we estimate the revealed ideology of 2020 congressional candidates (incumbents, their primary challengers, and open seat candidates) before and after their primaries, using a homophily-based measure of domains shared on Twitter. This method provides temporally granular data to observe changes in communication within a single election campaign cycle. We find that incumbents did move towards extremes for their primaries and back towards the center for the general election, but only when threatened by a well-funded primary challenge, though non-incumbents did not.
-
Working Paper
Network Embedding Methods for Large Networks in Political Science
Working Paper, November 2021
Social networks play an important role in many political science studies. With the rise of social media, these networks have grown in both size and complexity. Analysis of these large networks requires generation of feature representations that can be used in machine learning models. One way to generate these feature representations is to use network embedding methods for learning low-dimensional feature representations of nodes and edges in a network. While there is some literature comparing the advantages and shortcomings of these models, to our knowledge, there has not been any analysis on the applicability of network embedding models to classification tasks in political science. In this paper, we compare the performance of five prominent network embedding methods on prediction of ideology of Twitter users and ideology of Internet domains. We find that LINE provides the best feature representation across all 4 datasets that we use, resulting in the highest performance accuracy. Finally, we provide the guidelines for researchers on the use of these models for their own research.
-
Working Paper
Measuring the Ideology of Audiences for Web Links and Domains Using Differentially Private Engagement Data
Working Paper, January 2021
This paper demonstrates the use of differentially private hyperlink-level engagement data for measuring ideologies of audiences for web domains, individual links, or aggregations thereof. We examine a simple metric for measuring this ideological position and assess the conditions under which the metric is robust to injected, privacy-preserving noise. This assessment provides insights into and constraints on the level of activity one should observe when applying this metric to privacy-protected data. Grounding this work is a massive dataset of social media engagement activity where privacy-preserving noise has been injected into the activity data, provided by Facebook and the Social Science One (SS1) consortium. Using this dataset, we validate our ideology measures by comparing to similar, published work on sharing-based, homophily- and content-oriented measures, where we show consistently high correlation (>0.87). We then apply this metric to individual links from several popular news domains and demonstrate how one can assess link-level distributions of ideological audiences. We further show this estimator is robust to selection of engagement types besides sharing, where domain-level audience-ideology assessments based on views and likes show no significant difference compared to sharing-based estimates. Estimates of partisanship, however, suggest the viewing audience is more moderate than the audiences who share and like these domains. Beyond providing thresholds on sufficient activity for measuring audience ideology and comparing three types of engagement, this analysis provides a blueprint for ensuring robustness of future work to differential privacy protections.
-
Working Paper
News Sharing on Social Media: Mapping the Ideology of News Media Content, Citizens, and Politicians
Working Paper, November 2020
-
Working Paper
A Comparison of Methods in Political Science Text Classification: Transfer Learning Language Models for Politics
Working Paper, October 2020
Automated text classification has rapidly become an important tool for political analysis. Recent advancements in NLP enabled by advances in deep learning now achieve state of the art results in many standard tasks for the field. However, these methods require large amounts of both computing power and text data to learn the characteristics of the language, resources which are not always accessible to political scientists. One solution is a transfer learning approach, where knowledge learned in one area or source task is transferred to another area or a target task. A class of models that embody this approach are language models, which demonstrate extremely high levels of performance. We investigate the performance of these models in the political science by comparing multiple text classification methods. We find RoBERTa and XLNet, language models that rely on theTransformer, require fewer computing resources and less training data to perform on par with – or outperform – several political science text classification methods. Moreover, we find that the increase in accuracy is especially significant in the case of small labeled data, highlighting the potential for reducing the data-labeling cost of supervised methods for political scientists via the use of pretrained language models.
-
Working Paper
-
Working Paper
Social media is used by millions of Americans to acquire political news and information. Most of this research has focused on understanding the way social media consumption affects the political behavior and preferences of White Americans. Much less is known about Latinos’ political activity on social media, who are not only the largest racial/ethnic minority group in the U.S., but they also continue to exhibit diverse political preferences. Moreover, about 30% of Latinos rely primarily on Spanish-language news sources (Spanish-dominant Latinos) and another 30% are bilingual. Given that Spanish-language social media is not as heavily monitored for misinformation than its English-language counterparts (Valencia, 2021; Paul, 2021), Spanish-dominant Latinos who rely on social media for news may be more susceptible to political misinformation than those Latinos who are exposed to English-language social media. We address this contention by fielding an original study that sampled a large number of Latino and White respondents. Consistent with our expectations, Latinos who rely on Spanish-language social media are more likely to believe in election fraud than those who use both English and Spanish social media new sources. We also find that Latinos engage in more political activities on social media when compared to White Americans, particularly on their social media of choice, WhatsApp.