Academic Research
CSMaP faculty, postdoctoral fellows, and students publish rigorous, peer-reviewed research in top academic journals and post working papers sharing ongoing work.
Search or Filter
-
Journal Article
Asymmetric Ideological Segregation In Exposure To Political News on Facebook
Science, 2023
Does Facebook enable ideological segregation in political news consumption? We analyzed exposure to news during the US 2020 election using aggregated data for 208 million US Facebook users. We compared the inventory of all political news that users could have seen in their feeds with the information that they saw (after algorithmic curation) and the information with which they engaged. We show that (i) ideological segregation is high and increases as we shift from potential exposure to actual exposure to engagement; (ii) there is an asymmetry between conservative and liberal audiences, with a substantial corner of the news ecosystem consumed exclusively by conservatives; and (iii) most misinformation, as identified by Meta’s Third-Party Fact-Checking Program, exists within this homogeneously conservative corner, which has no equivalent on the liberal side. Sources favored by conservative audiences were more prevalent on Facebook’s news ecosystem than those favored by liberals.
-
Journal Article
Measuring the Ideology of Audiences for Web Links and Domains Using Differentially Private Engagement Data
Proceedings of the International AAAI Conference on Web and Social Media, 2023
This paper demonstrates the use of differentially private hyperlink-level engagement data for measuring ideologies of audiences for web domains, individual links, or aggregations thereof. We examine a simple metric for measuring this ideological position and assess the conditions under which the metric is robust to injected, privacy-preserving noise. This assessment provides insights into and constraints on the level of activity one should observe when applying this metric to privacy-protected data. Grounding this work is a massive dataset of social media engagement activity where privacy-preserving noise has been injected into the activity data, provided by Facebook and the Social Science One (SS1) consortium. Using this dataset, we validate our ideology measures by comparing to similar, published work on sharing-based, homophily- and content-oriented measures, where we show consistently high correlation (>0.87). We then apply this metric to individual links from several popular news domains and demonstrate how one can assess link-level distributions of ideological audiences. We further show this estimator is robust to selection of engagement types besides sharing, where domain-level audience-ideology assessments based on views and likes show no significant difference compared to sharing-based estimates. Estimates of partisanship, however, suggest the viewing audience is more moderate than the audiences who share and like these domains. Beyond providing thresholds on sufficient activity for measuring audience ideology and comparing three types of engagement, this analysis provides a blueprint for ensuring robustness of future work to differential privacy protections.
-
Book
Computational Social Science for Policy and Quality of Democracy: Public Opinion, Hate Speech, Misinformation, and Foreign Influence Campaigns
Handbook of Computational Social Science for Policy, 2023
-
Journal Article
Exposure to the Russian Internet Research Agency Foreign Influence Campaign on Twitter in the 2016 US Election and Its Relationship to Attitudes and Voting Behavior
Nature Communications, 2023
There is widespread concern that foreign actors are using social media to interfere in elections worldwide. Yet data have been unavailable to investigate links between exposure to foreign influence campaigns and political behavior. Using longitudinal survey data from US respondents linked to their Twitter feeds, we quantify the relationship between exposure to the Russian foreign influence campaign and attitudes and voting behavior in the 2016 US election. We demonstrate, first, that exposure to Russian disinformation accounts was heavily concentrated: only 1% of users accounted for 70% of exposures. Second, exposure was concentrated among users who strongly identified as Republicans. Third, exposure to the Russian influence campaign was eclipsed by content from domestic news media and politicians. Finally, we find no evidence of a meaningful relationship between exposure to the Russian foreign influence campaign and changes in attitudes, polarization, or voting behavior. The results have implications for understanding the limits of election interference campaigns on social media.
-
Journal Article
Dictionary-Assisted Supervised Contrastive Learning
Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2022
Text analysis in the social sciences often involves using specialized dictionaries to reason with abstract concepts, such as perceptions about the economy or abuse on social media. These dictionaries allow researchers to impart domain knowledge and note subtle usages of words relating to a concept(s) of interest. We introduce the dictionary-assisted supervised contrastive learning (DASCL) objective, allowing researchers to leverage specialized dictionaries when fine-tuning pretrained language models. The text is first keyword simplified: a common, fixed token replaces any word in the corpus that appears in the dictionary(ies) relevant to the concept of interest. During fine-tuning, a supervised contrastive objective draws closer the embeddings of the original and keyword-simplified texts of the same class while pushing further apart the embeddings of different classes. The keyword-simplified texts of the same class are more textually similar than their original text counterparts, which additionally draws the embeddings of the same class closer together. Combining DASCL and cross-entropy improves classification performance metrics in few-shot learning settings and social science applications compared to using cross-entropy alone and alternative contrastive and data augmentation methods.
-
Journal Article
Using Social Media Data to Reveal Patterns of Policy Engagement in State Legislatures
State Politics & Policy Quarterly, 2022
-
Journal Article
Most Users Do Not Follow Political Elites on Twitter; Those Who Do, Show Overwhelming Preferences for Ideological Congruity.
Science Advances, 2022
We offer comprehensive evidence of preferences for ideological congruity when people engage with politicians, pundits, and news organizations on social media. Using four years of data (2016-2019) from a random sample of 1.5 million Twitter users, we examine three behaviors studied separately to date: (a) following of in-group vs. out-group elites, (b) sharing in-group vs. out-group information (retweeting), and (c) commenting on the shared information (quote tweeting). We find the majority of users (60%) do not follow any political elites. Those who do, follow in-group elite accounts at much higher rates than out-group accounts (90% vs. 10%), share information from in-group elites 13 times more frequently than from out-group elites, and often add negative comments to the shared out-group information. Conservatives are twice as likely as liberals to share in-group vs. out-group content. These patterns are robust, emerge across issues and political elites, and regardless of users' ideological extremity.
-
Journal Article
Election Fraud, YouTube, and Public Perception of the Legitimacy of President Biden
Journal of Online Trust and Safety, 2022
Skepticism about the outcome of the 2020 presidential election in the United States led to a historic attack on the Capitol on January 6th, 2021 and represents one of the greatest challenges to America's democratic institutions in over a century. Narratives of fraud and conspiracy theories proliferated over the fall of 2020, finding fertile ground across online social networks, although little is know about the extent and drivers of this spread. In this article, we show that users who were more skeptical of the election's legitimacy were more likely to be recommended content that featured narratives about the legitimacy of the election. Our findings underscore the tension between an "effective" recommendation system that provides users with the content they want, and a dangerous mechanism by which misinformation, disinformation, and conspiracies can find their way to those most likely to believe them.
-
Journal Article
What We Learned About The Gateway Pundit from its Own Web Traffic Data
Workshop Proceedings of the 16th International AAAI Conference on Web and Social Media, 2022
To mitigate the spread of false news, researchers need to understand who visits low-quality news sites, what brings people to those sites, and what content they prefer to consume. Due to challenges in observing most direct website traffic, existing research primarily relies on alternative data sources, such as engagement signals from social media posts. However, such signals are at best only proxies for actual website visits. During an audit of far-right news websites, we discovered that The Gateway Pundit (TGP) has made its web traffic data publicly available, giving us a rare opportunity to understand what news pages people actually visit. We collected 68 million web traffic visits to the site over a one-month period and analyzed how people consume news via multiple features. Our referral analysis shows that search engines and social media platforms are the main drivers of traffic; our geo-location analysis reveals that TGP is more popular in counties where more people voted for Trump in 2020. In terms of content, topics related to 2020 US presidential election and 2021 US capital riot have the highest average number of visits. We also use these data to quantify to what degree social media engagement signals correlate with actual web visit counts. To do so, we collect Facebook and Twitter posts with URLs from TGP during the same time period. We show that all engagement signals positively correlate with web visit counts, but with varying correlation strengths. For example, total interaction on Facebook correlates better than Twitter retweet count. Our insights can also help researchers choose the right metrics when they measure the impact of news URLs on social media.
-
Working Paper
Echo Chambers, Rabbit Holes, and Algorithmic Bias: How YouTube Recommends Content to Real Users
Working Paper, May 2022
To what extent does the YouTube recommendation algorithm push users into echo chambers, ideologically biased content, or rabbit holes? Despite growing popular concern, recent work suggests that the recommendation algorithm is not pushing users into these echo chambers. However, existing research relies heavily on the use of anonymous data collection that does not account for the personalized nature of the recommendation algorithm. We asked a sample of real users to install a browser extension that downloaded the list of videos they were recommended. We instructed these users to start on an assigned video and then click through 20 sets of recommendations, capturing what they were being shown in real time as they used the platform logged into their real accounts. Using a novel method to estimate the ideology of a YouTube video, we demonstrate that the YouTube recommendation algorithm does, in fact, push real users into mild ideological echo chambers where, by the end of the data collection task, liberals and conservatives received different distributions of recommendations from each other, though this difference is small. While we find evidence that this difference increases the longer the user followed the recommendation algorithm, we do not find evidence that many go down `rabbit holes' that lead them to ideologically extreme content. Finally, we find that YouTube pushes all users, regardless of ideology, towards moderately conservative and an increasingly narrow range of ideological content the longer they follow YouTube's recommendations.
-
Journal Article
News Credibility Labels Have Limited Average Effects on News Diet Quality and Fail to Reduce Misperceptions
Science Advances, 2022
As the primary arena for viral misinformation shifts toward transnational threats, the search continues for scalable countermeasures compatible with principles of transparency and free expression. We conducted a randomized field experiment evaluating the impact of source credibility labels embedded in users’ social feeds and search results pages. By combining representative surveys (n = 3337) and digital trace data (n = 968) from a subset of respondents, we provide a rare ecologically valid test of such an intervention on both attitudes and behavior. On average across the sample, we are unable to detect changes in real-world consumption of news from low-quality sources after 3 weeks. We can also rule out small effects on perceived accuracy of popular misinformation spread about the Black Lives Matter movement and coronavirus disease 2019. However, we present suggestive evidence of a substantively meaningful increase in news diet quality among the heaviest consumers of misinformation. We discuss the implications of our findings for scholars and practitioners.
-
Working Paper
To Moderate, Or Not to Moderate: Strategic Domain Sharing by Congressional Campaigns
Working Paper, April 2022
We test whether candidates move to the extremes before a primary but then return to the center for the general election to appeal to the different preferences of each electorate. Incumbents are now more vulnerable to primary challenges than ever as social media offers a viable pathway for fundraising and messaging to challengers, while homogeneity of districts has reduced general election competitiveness. To assess candidates' ideological trajectories, we estimate the revealed ideology of 2020 congressional candidates (incumbents, their primary challengers, and open seat candidates) before and after their primaries, using a homophily-based measure of domains shared on Twitter. This method provides temporally granular data to observe changes in communication within a single election campaign cycle. We find that incumbents did move towards extremes for their primaries and back towards the center for the general election, but only when threatened by a well-funded primary challenge, though non-incumbents did not.
-
Journal Article
Why Botter: How Pro-Government Bots Fight Opposition in Russia
American Political Science Review, 2022
There is abundant anecdotal evidence that nondemocratic regimes are harnessing new digital technologies known as social media bots to facilitate policy goals. However, few previous attempts have been made to systematically analyze the use of bots that are aimed at a domestic audience in autocratic regimes. We develop two alternative theoretical frameworks for predicting the use of pro-regime bots: one which focuses on bot deployment in response to offline protest and the other in response to online protest. We then test the empirical implications of these frameworks with an original collection of Twitter data generated by Russian pro-government bots. We find that the online opposition activities produce stronger reactions from bots than offline protests. Our results provide a lower bound on the effects of bots on the Russian Twittersphere and highlight the importance of bot detection for the study of political communication on social media in nondemocratic regimes.
-
Journal Article
What’s Not to Like? Facebook Page Likes Reveal Limited Polarization in Lifestyle Preferences
Political Communication, 2021
Increasing levels of political animosity in the United States invite speculation about whether polarization extends to aspects of daily life. However, empirical study about the relationship between political ideologies and lifestyle choices is limited by a lack of comprehensive data. In this research, we combine survey and Facebook Page “likes” data from more than 1,200 respondents to investigate the extent of polarization in lifestyle domains. Our results indicate that polarization is present in page categories that are somewhat related to politics – such as opinion leaders, partisan news sources, and topics related to identity and religion – but, perhaps surprisingly, it is mostly not evident in other domains, including sports, food, and music. On the individual level, we find that people who are higher in political news interest and have stronger ideological predispositions have a greater tendency to “like” ideologically homogeneous pages across categories. Our evidence, drawn from rare digital trace data covering more than 5,000 pages, adds nuance to the narrative of widespread polarization across lifestyle sectors and it suggests domains in which cross-cutting preferences are still observed in American life.
-
Journal Article
Short of Suspension: How Suspension Warnings Can Reduce Hate Speech on Twitter
Perspectives on Politics, 2023
Debates around the effectiveness of high-profile Twitter account suspensions and similar bans on abusive users across social media platforms abound. Yet we know little about the effectiveness of warning a user about the possibility of suspending their account as opposed to outright suspensions in reducing hate speech. With a pre-registered experiment, we provide causal evidence that a warning message can reduce the use of hateful language on Twitter, at least in the short term. We design our messages based on the literature on deterrence, and test versions that emphasize the legitimacy of the sender, the credibility of the message, and the costliness of being suspended. We find that the act of warning a user of the potential consequences of their behavior can significantly reduce their hateful language for one week. We also find that warning messages that aim to appear legitimate in the eyes of the target user seem to be the most effective. In light of these findings, we consider the policy implications of platforms adopting a more aggressive approach to warning users that their accounts may be suspended as a tool for reducing hateful speech online.
-
Working Paper
Network Embedding Methods for Large Networks in Political Science
Working Paper, November 2021
Social networks play an important role in many political science studies. With the rise of social media, these networks have grown in both size and complexity. Analysis of these large networks requires generation of feature representations that can be used in machine learning models. One way to generate these feature representations is to use network embedding methods for learning low-dimensional feature representations of nodes and edges in a network. While there is some literature comparing the advantages and shortcomings of these models, to our knowledge, there has not been any analysis on the applicability of network embedding models to classification tasks in political science. In this paper, we compare the performance of five prominent network embedding methods on prediction of ideology of Twitter users and ideology of Internet domains. We find that LINE provides the best feature representation across all 4 datasets that we use, resulting in the highest performance accuracy. Finally, we provide the guidelines for researchers on the use of these models for their own research.
-
Journal Article
Moderating with the Mob: Evaluating the Efficacy of Real-Time Crowdsourced Fact-Checking
Journal of Online Trust and Safety, 2021
Reducing the spread of false news remains a challenge for social media platforms, as the current strategy of using third-party fact- checkers lacks the capacity to address both the scale and speed of misinformation diffusion. Research on the “wisdom of the crowds” suggests one possible solution: aggregating the evaluations of ordinary users to assess the veracity of information. In this study, we investigate the effectiveness of a scalable model for real-time crowdsourced fact-checking. We select 135 popular news stories and have them evaluated by both ordinary individuals and professional fact-checkers within 72 hours of publication, producing 12,883 individual evaluations. Although we find that machine learning-based models using the crowd perform better at identifying false news than simple aggregation rules, our results suggest that neither approach is able to perform at the level of professional fact-checkers. Additionally, both methods perform best when using evaluations only from survey respondents with high political knowledge, suggesting reason for caution for crowdsourced models that rely on a representative sample of the population. Overall, our analyses reveal that while crowd-based systems provide some information on news quality, they are nonetheless limited—and have significant variation—in their ability to identify false news.
-
Journal Article
SARS-CoV-2 RNA Concentrations in Wastewater Foreshadow Dynamics and Clinical Presentation of New COVID-19 Cases
Science of the Total Environment, 2022
Current estimates of COVID-19 prevalence are largely based on symptomatic, clinically diagnosed cases. The existence of a large number of undiagnosed infections hampers population-wide investigation of viral circulation. Here, we quantify the SARS-CoV-2 concentration and track its dynamics in wastewater at a major urban wastewater treatment facility in Massachusetts, between early January and May 2020. SARS-CoV-2 was first detected in wastewater on March 3. SARS-CoV-2 RNA concentrations in wastewater correlated with clinically diagnosed new COVID-19 cases, with the trends appearing 4–10 days earlier in wastewater than in clinical data. We inferred viral shedding dynamics by modeling wastewater viral load as a convolution of back-dated new clinical cases with the average population-level viral shedding function. The inferred viral shedding function showed an early peak, likely before symptom onset and clinical diagnosis, consistent with emerging clinical and experimental evidence. This finding suggests that SARS-CoV-2 concentrations in wastewater may be primarily driven by viral shedding early in infection. This work shows that longitudinal wastewater analysis can be used to identify trends in disease transmission in advance of clinical case reporting, and infer early viral shedding dynamics for newly infected individuals, which are difficult to capture in clinical investigations.
-
Journal Article
Twitter Flagged Donald Trump’s Tweets with Election Misinformation: They Continued to Spread Both On and Off the Platform
Harvard Kennedy School (HKS) Misinformation Review, 2021
We analyze the spread of Donald Trump’s tweets that were flagged by Twitter using two intervention strategies—attaching a warning label and blocking engagement with the tweet entirely. We find that while blocking engagement on certain tweets limited their diffusion, messages we examined with warning labels spread further on Twitter than those without labels. Additionally, the messages that had been blocked on Twitter remained popular on Facebook, Instagram, and Reddit, being posted more often and garnering more visibility than messages that had either been labeled by Twitter or received no intervention at all. Taken together, our results emphasize the importance of considering content moderation at the ecosystem level.
-
Journal Article
Testing the Effects of Facebook Usage in an Ethnically Polarized Setting
Proceedings of the National Academy of Sciences, 2021
Despite the belief that social media is altering intergroup dynamics—bringing people closer or further alienating them from one another—the impact of social media on interethnic attitudes has yet to be rigorously evaluated, especially within areas with tenuous interethnic relations. We report results from a randomized controlled trial in Bosnia and Herzegovina (BiH), exploring the effects of exposure to social media during 1 wk around genocide remembrance in July 2019 on a set of interethnic attitudes of Facebook users. We find evidence that, counter to preregistered expectations, people who deactivated their Facebook profiles report lower regard for ethnic outgroups than those who remained active. Moreover, we present additional evidence suggesting that this effect is likely conditional on the level of ethnic heterogeneity of respondents’ residence. We also extend the analysis to include measures of subjective well-being and knowledge of news. Here, we find that Facebook deactivation leads to suggestive improvements in subjective wellbeing and a decrease in knowledge of current events, replicating results from recent research in the United States in a very different context, thus increasing our confidence in the generalizability of these effects.
-
Journal Article
Accessibility and Generalizability: Are Social Media Effects Moderated by Age or Digital Literacy?
Research & Politics, 2021
-
Journal Article
The Times They Are Rarely A-Changin': Circadian Regularities in Social Media Use
Journal of Quantitative Description: Digital Media, 2021
-
Journal Article
Cracking Open the News Feed: Exploring What U.S. Facebook Users See and Share with Large-Scale Platform Data
Journal of Quantitative Description: Digital Media, 2021
-
Journal Article
YouTube Recommendations and Effects on Sharing Across Online Social Platforms
Proceedings of the ACM on Human-Computer Interaction, 2021
-
Journal Article
Tweeting Beyond Tahrir: Ideological Diversity and Political Intolerance in Egyptian Twitter Networks
World Politics, 2021
Do online social networks affect political tolerance in the highly polarized climate of postcoup Egypt? Taking advantage of the real-time networked structure of Twitter data, the authors find that not only is greater network diversity associated with lower levels of intolerance, but also that longer exposure to a diverse network is linked to less expression of intolerance over time. The authors find that this relationship persists in both elite and non-elite diverse networks. Exploring the mechanisms by which network diversity might affect tolerance, the authors offer suggestive evidence that social norms in online networks may shape individuals’ propensity to publicly express intolerant attitudes. The findings contribute to the political tolerance literature and enrich the ongoing debate over the relationship between online echo chambers and political attitudes and behavior by providing new insights from a repressive authoritarian context.