Academic Research
CSMaP faculty, postdoctoral fellows, and students publish rigorous, peer-reviewed research in top academic journals and post working papers sharing ongoing work.
Search or Filter
-
Journal Article
Measuring the Ideology of Audiences for Web Links and Domains Using Differentially Private Engagement Data
Proceedings of the International AAAI Conference on Web and Social Media, 2023
This paper demonstrates the use of differentially private hyperlink-level engagement data for measuring ideologies of audiences for web domains, individual links, or aggregations thereof. We examine a simple metric for measuring this ideological position and assess the conditions under which the metric is robust to injected, privacy-preserving noise. This assessment provides insights into and constraints on the level of activity one should observe when applying this metric to privacy-protected data. Grounding this work is a massive dataset of social media engagement activity where privacy-preserving noise has been injected into the activity data, provided by Facebook and the Social Science One (SS1) consortium. Using this dataset, we validate our ideology measures by comparing to similar, published work on sharing-based, homophily- and content-oriented measures, where we show consistently high correlation (>0.87). We then apply this metric to individual links from several popular news domains and demonstrate how one can assess link-level distributions of ideological audiences. We further show this estimator is robust to selection of engagement types besides sharing, where domain-level audience-ideology assessments based on views and likes show no significant difference compared to sharing-based estimates. Estimates of partisanship, however, suggest the viewing audience is more moderate than the audiences who share and like these domains. Beyond providing thresholds on sufficient activity for measuring audience ideology and comparing three types of engagement, this analysis provides a blueprint for ensuring robustness of future work to differential privacy protections.
-
Book
Computational Social Science for Policy and Quality of Democracy: Public Opinion, Hate Speech, Misinformation, and Foreign Influence Campaigns
Handbook of Computational Social Science for Policy, 2023
-
Journal Article
Exposure to the Russian Internet Research Agency Foreign Influence Campaign on Twitter in the 2016 US Election and Its Relationship to Attitudes and Voting Behavior
Nature Communications, 2023
There is widespread concern that foreign actors are using social media to interfere in elections worldwide. Yet data have been unavailable to investigate links between exposure to foreign influence campaigns and political behavior. Using longitudinal survey data from US respondents linked to their Twitter feeds, we quantify the relationship between exposure to the Russian foreign influence campaign and attitudes and voting behavior in the 2016 US election. We demonstrate, first, that exposure to Russian disinformation accounts was heavily concentrated: only 1% of users accounted for 70% of exposures. Second, exposure was concentrated among users who strongly identified as Republicans. Third, exposure to the Russian influence campaign was eclipsed by content from domestic news media and politicians. Finally, we find no evidence of a meaningful relationship between exposure to the Russian foreign influence campaign and changes in attitudes, polarization, or voting behavior. The results have implications for understanding the limits of election interference campaigns on social media.
-
Journal Article
Dictionary-Assisted Supervised Contrastive Learning
Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2022
Text analysis in the social sciences often involves using specialized dictionaries to reason with abstract concepts, such as perceptions about the economy or abuse on social media. These dictionaries allow researchers to impart domain knowledge and note subtle usages of words relating to a concept(s) of interest. We introduce the dictionary-assisted supervised contrastive learning (DASCL) objective, allowing researchers to leverage specialized dictionaries when fine-tuning pretrained language models. The text is first keyword simplified: a common, fixed token replaces any word in the corpus that appears in the dictionary(ies) relevant to the concept of interest. During fine-tuning, a supervised contrastive objective draws closer the embeddings of the original and keyword-simplified texts of the same class while pushing further apart the embeddings of different classes. The keyword-simplified texts of the same class are more textually similar than their original text counterparts, which additionally draws the embeddings of the same class closer together. Combining DASCL and cross-entropy improves classification performance metrics in few-shot learning settings and social science applications compared to using cross-entropy alone and alternative contrastive and data augmentation methods.
-
Journal Article
Using Social Media Data to Reveal Patterns of Policy Engagement in State Legislatures
State Politics & Policy Quarterly, 2022
-
Journal Article
Most Users Do Not Follow Political Elites on Twitter; Those Who Do, Show Overwhelming Preferences for Ideological Congruity.
Science Advances, 2022
We offer comprehensive evidence of preferences for ideological congruity when people engage with politicians, pundits, and news organizations on social media. Using four years of data (2016-2019) from a random sample of 1.5 million Twitter users, we examine three behaviors studied separately to date: (a) following of in-group vs. out-group elites, (b) sharing in-group vs. out-group information (retweeting), and (c) commenting on the shared information (quote tweeting). We find the majority of users (60%) do not follow any political elites. Those who do, follow in-group elite accounts at much higher rates than out-group accounts (90% vs. 10%), share information from in-group elites 13 times more frequently than from out-group elites, and often add negative comments to the shared out-group information. Conservatives are twice as likely as liberals to share in-group vs. out-group content. These patterns are robust, emerge across issues and political elites, and regardless of users' ideological extremity.
-
Journal Article
Election Fraud, YouTube, and Public Perception of the Legitimacy of President Biden
Journal of Online Trust and Safety, 2022
Skepticism about the outcome of the 2020 presidential election in the United States led to a historic attack on the Capitol on January 6th, 2021 and represents one of the greatest challenges to America's democratic institutions in over a century. Narratives of fraud and conspiracy theories proliferated over the fall of 2020, finding fertile ground across online social networks, although little is know about the extent and drivers of this spread. In this article, we show that users who were more skeptical of the election's legitimacy were more likely to be recommended content that featured narratives about the legitimacy of the election. Our findings underscore the tension between an "effective" recommendation system that provides users with the content they want, and a dangerous mechanism by which misinformation, disinformation, and conspiracies can find their way to those most likely to believe them.
-
Journal Article
What We Learned About The Gateway Pundit from its Own Web Traffic Data
Workshop Proceedings of the 16th International AAAI Conference on Web and Social Media, 2022
To mitigate the spread of false news, researchers need to understand who visits low-quality news sites, what brings people to those sites, and what content they prefer to consume. Due to challenges in observing most direct website traffic, existing research primarily relies on alternative data sources, such as engagement signals from social media posts. However, such signals are at best only proxies for actual website visits. During an audit of far-right news websites, we discovered that The Gateway Pundit (TGP) has made its web traffic data publicly available, giving us a rare opportunity to understand what news pages people actually visit. We collected 68 million web traffic visits to the site over a one-month period and analyzed how people consume news via multiple features. Our referral analysis shows that search engines and social media platforms are the main drivers of traffic; our geo-location analysis reveals that TGP is more popular in counties where more people voted for Trump in 2020. In terms of content, topics related to 2020 US presidential election and 2021 US capital riot have the highest average number of visits. We also use these data to quantify to what degree social media engagement signals correlate with actual web visit counts. To do so, we collect Facebook and Twitter posts with URLs from TGP during the same time period. We show that all engagement signals positively correlate with web visit counts, but with varying correlation strengths. For example, total interaction on Facebook correlates better than Twitter retweet count. Our insights can also help researchers choose the right metrics when they measure the impact of news URLs on social media.
-
Working Paper
Echo Chambers, Rabbit Holes, and Algorithmic Bias: How YouTube Recommends Content to Real Users
Working Paper, May 2022
To what extent does the YouTube recommendation algorithm push users into echo chambers, ideologically biased content, or rabbit holes? Despite growing popular concern, recent work suggests that the recommendation algorithm is not pushing users into these echo chambers. However, existing research relies heavily on the use of anonymous data collection that does not account for the personalized nature of the recommendation algorithm. We asked a sample of real users to install a browser extension that downloaded the list of videos they were recommended. We instructed these users to start on an assigned video and then click through 20 sets of recommendations, capturing what they were being shown in real time as they used the platform logged into their real accounts. Using a novel method to estimate the ideology of a YouTube video, we demonstrate that the YouTube recommendation algorithm does, in fact, push real users into mild ideological echo chambers where, by the end of the data collection task, liberals and conservatives received different distributions of recommendations from each other, though this difference is small. While we find evidence that this difference increases the longer the user followed the recommendation algorithm, we do not find evidence that many go down `rabbit holes' that lead them to ideologically extreme content. Finally, we find that YouTube pushes all users, regardless of ideology, towards moderately conservative and an increasingly narrow range of ideological content the longer they follow YouTube's recommendations.
-
Journal Article
News Credibility Labels Have Limited Average Effects on News Diet Quality and Fail to Reduce Misperceptions
Science Advances, 2022
As the primary arena for viral misinformation shifts toward transnational threats, the search continues for scalable countermeasures compatible with principles of transparency and free expression. We conducted a randomized field experiment evaluating the impact of source credibility labels embedded in users’ social feeds and search results pages. By combining representative surveys (n = 3337) and digital trace data (n = 968) from a subset of respondents, we provide a rare ecologically valid test of such an intervention on both attitudes and behavior. On average across the sample, we are unable to detect changes in real-world consumption of news from low-quality sources after 3 weeks. We can also rule out small effects on perceived accuracy of popular misinformation spread about the Black Lives Matter movement and coronavirus disease 2019. However, we present suggestive evidence of a substantively meaningful increase in news diet quality among the heaviest consumers of misinformation. We discuss the implications of our findings for scholars and practitioners.
-
Journal Article
What’s Not to Like? Facebook Page Likes Reveal Limited Polarization in Lifestyle Preferences
Political Communication, 2021
Increasing levels of political animosity in the United States invite speculation about whether polarization extends to aspects of daily life. However, empirical study about the relationship between political ideologies and lifestyle choices is limited by a lack of comprehensive data. In this research, we combine survey and Facebook Page “likes” data from more than 1,200 respondents to investigate the extent of polarization in lifestyle domains. Our results indicate that polarization is present in page categories that are somewhat related to politics – such as opinion leaders, partisan news sources, and topics related to identity and religion – but, perhaps surprisingly, it is mostly not evident in other domains, including sports, food, and music. On the individual level, we find that people who are higher in political news interest and have stronger ideological predispositions have a greater tendency to “like” ideologically homogeneous pages across categories. Our evidence, drawn from rare digital trace data covering more than 5,000 pages, adds nuance to the narrative of widespread polarization across lifestyle sectors and it suggests domains in which cross-cutting preferences are still observed in American life.
-
Journal Article
Short of Suspension: How Suspension Warnings Can Reduce Hate Speech on Twitter
Perspectives on Politics, 2023
Debates around the effectiveness of high-profile Twitter account suspensions and similar bans on abusive users across social media platforms abound. Yet we know little about the effectiveness of warning a user about the possibility of suspending their account as opposed to outright suspensions in reducing hate speech. With a pre-registered experiment, we provide causal evidence that a warning message can reduce the use of hateful language on Twitter, at least in the short term. We design our messages based on the literature on deterrence, and test versions that emphasize the legitimacy of the sender, the credibility of the message, and the costliness of being suspended. We find that the act of warning a user of the potential consequences of their behavior can significantly reduce their hateful language for one week. We also find that warning messages that aim to appear legitimate in the eyes of the target user seem to be the most effective. In light of these findings, we consider the policy implications of platforms adopting a more aggressive approach to warning users that their accounts may be suspended as a tool for reducing hateful speech online.
-
Journal Article
Moderating with the Mob: Evaluating the Efficacy of Real-Time Crowdsourced Fact-Checking
Journal of Online Trust and Safety, 2021
Reducing the spread of false news remains a challenge for social media platforms, as the current strategy of using third-party fact- checkers lacks the capacity to address both the scale and speed of misinformation diffusion. Research on the “wisdom of the crowds” suggests one possible solution: aggregating the evaluations of ordinary users to assess the veracity of information. In this study, we investigate the effectiveness of a scalable model for real-time crowdsourced fact-checking. We select 135 popular news stories and have them evaluated by both ordinary individuals and professional fact-checkers within 72 hours of publication, producing 12,883 individual evaluations. Although we find that machine learning-based models using the crowd perform better at identifying false news than simple aggregation rules, our results suggest that neither approach is able to perform at the level of professional fact-checkers. Additionally, both methods perform best when using evaluations only from survey respondents with high political knowledge, suggesting reason for caution for crowdsourced models that rely on a representative sample of the population. Overall, our analyses reveal that while crowd-based systems provide some information on news quality, they are nonetheless limited—and have significant variation—in their ability to identify false news.
-
Journal Article
SARS-CoV-2 RNA Concentrations in Wastewater Foreshadow Dynamics and Clinical Presentation of New COVID-19 Cases
Science of the Total Environment, 2022
Current estimates of COVID-19 prevalence are largely based on symptomatic, clinically diagnosed cases. The existence of a large number of undiagnosed infections hampers population-wide investigation of viral circulation. Here, we quantify the SARS-CoV-2 concentration and track its dynamics in wastewater at a major urban wastewater treatment facility in Massachusetts, between early January and May 2020. SARS-CoV-2 was first detected in wastewater on March 3. SARS-CoV-2 RNA concentrations in wastewater correlated with clinically diagnosed new COVID-19 cases, with the trends appearing 4–10 days earlier in wastewater than in clinical data. We inferred viral shedding dynamics by modeling wastewater viral load as a convolution of back-dated new clinical cases with the average population-level viral shedding function. The inferred viral shedding function showed an early peak, likely before symptom onset and clinical diagnosis, consistent with emerging clinical and experimental evidence. This finding suggests that SARS-CoV-2 concentrations in wastewater may be primarily driven by viral shedding early in infection. This work shows that longitudinal wastewater analysis can be used to identify trends in disease transmission in advance of clinical case reporting, and infer early viral shedding dynamics for newly infected individuals, which are difficult to capture in clinical investigations.
-
Journal Article
Twitter Flagged Donald Trump’s Tweets with Election Misinformation: They Continued to Spread Both On and Off the Platform
Harvard Kennedy School (HKS) Misinformation Review, 2021
We analyze the spread of Donald Trump’s tweets that were flagged by Twitter using two intervention strategies—attaching a warning label and blocking engagement with the tweet entirely. We find that while blocking engagement on certain tweets limited their diffusion, messages we examined with warning labels spread further on Twitter than those without labels. Additionally, the messages that had been blocked on Twitter remained popular on Facebook, Instagram, and Reddit, being posted more often and garnering more visibility than messages that had either been labeled by Twitter or received no intervention at all. Taken together, our results emphasize the importance of considering content moderation at the ecosystem level.
-
Journal Article
Accessibility and Generalizability: Are Social Media Effects Moderated by Age or Digital Literacy?
Research & Politics, 2021
-
Journal Article
The Times They Are Rarely A-Changin': Circadian Regularities in Social Media Use
Journal of Quantitative Description: Digital Media, 2021
-
Journal Article
Cracking Open the News Feed: Exploring What U.S. Facebook Users See and Share with Large-Scale Platform Data
Journal of Quantitative Description: Digital Media, 2021
-
Journal Article
YouTube Recommendations and Effects on Sharing Across Online Social Platforms
Proceedings of the ACM on Human-Computer Interaction, 2021
-
Journal Article
Political Psychology in the Digital (mis)Information age: A Model of News Belief and Sharing
Social Issues and Policy Review, 2021
The spread of misinformation, including “fake news,” propaganda, and conspiracy theories, represents a serious threat to society, as it has the potential to alter beliefs, behavior, and policy. Research is beginning to disentangle how and why misinformation is spread and identify processes that contribute to this social problem. We propose an integrative model to understand the social, political, and cognitive psychology risk factors that underlie the spread of misinformation and highlight strategies that might be effective in mitigating this problem. However, the spread of misinformation is a rapidly growing and evolving problem; thus scholars need to identify and test novel solutions, and work with policymakers to evaluate and deploy these solutions. Hence, we provide a roadmap for future research to identify where scholars should invest their energy in order to have the greatest overall impact.
-
Journal Article
Trumping Hate on Twitter? Online Hate Speech in the 2016 U.S. Election Campaign and its Aftermath.
Quarterly Journal of Political Science, 2021
To what extent did online hate speech and white nationalist rhetoric on Twitter increase over the course of Donald Trump's 2016 presidential election campaign and its immediate aftermath? The prevailing narrative suggests that Trump's political rise — and his unexpected victory — lent legitimacy to and popularized bigoted rhetoric that was once relegated to the dark corners of the Internet. However, our analysis of over 750 million tweets related to the election, in addition to almost 400 million tweets from a random sample of American Twitter users, provides systematic evidence that hate speech did not increase on Twitter over this period. Using both machine-learning-augmented dictionary-based methods and a novel classification approach leveraging data from Reddit communities associated with the alt-right movement, we observe no persistent increase in hate speech or white nationalist language either over the course of the campaign or in the six months following Trump's election. While key campaign events and policy announcements produced brief spikes in hateful language, these bursts quickly dissipated. Overall we find no empirical support for the proposition that Trump's divisive campaign or election increased hate speech on Twitter.
-
Journal Article
Political Sectarianism in America
Science, 2020
Political polarization, a concern in many countries, is especially acrimonious in the United States. For decades, scholars have studied polarization as an ideological matter — how strongly Democrats and Republicans diverge vis-à-vis political ideals and policy goals. Such competition among groups in the marketplace of ideas is a hallmark of a healthy democracy. But more recently, researchers have identified a second type of polarization, one focusing less on triumphs of ideas than on dominating the abhorrent supporters of the opposing party. This literature has produced a proliferation of insights and constructs but few interdisciplinary efforts to integrate them. We offer such an integration, pinpointing the superordinate construct of political sectarianism and identifying its three core ingredients: othering, aversion, and moralization. We then consider the causes of political sectarianism and its consequences for U.S. society — especially the threat it poses to democracy. Finally, we propose interventions for minimizing its most corrosive aspects.
-
Working Paper
-
Journal Article
Content-Based Features Predict Social Media Influence Operations
Science Advances, 2020
-
Journal Article
Cross-Platform State Propaganda: Russian Trolls on Twitter and YouTube During the 2016 U.S. Presidential Election
The International Journal of Press/Politics, 2020
This paper investigates online propaganda strategies of the Internet Research Agency (IRA)—Russian “trolls”—during the 2016 U.S. presidential election. We assess claims that the IRA sought either to (1) support Donald Trump or (2) sow discord among the U.S. public by analyzing hyperlinks contained in 108,781 IRA tweets. Our results show that although IRA accounts promoted links to both sides of the ideological spectrum, “conservative” trolls were more active than “liberal” ones. The IRA also shared content across social media platforms, particularly YouTube—the second-most linked destination among IRA tweets. Although overall news content shared by trolls leaned moderate to conservative, we find troll accounts on both sides of the ideological spectrum, and these accounts maintain their political alignment. Links to YouTube videos were decidedly conservative, however. While mixed, this evidence is consistent with the IRA’s supporting the Republican campaign, but the IRA’s strategy was multifaceted, with an ideological division of labor among accounts. We contextualize these results as consistent with a pre-propaganda strategy. This work demonstrates the need to view political communication in the context of the broader media ecology, as governments exploit the interconnected information ecosystem to pursue covert propaganda strategies.