Academic Research

CSMaP faculty, postdoctoral fellows, and students publish rigorous, peer-reviewed research in top academic journals and post working papers sharing ongoing work.

Search or Filter

  • Working Paper

    A Comparison of Methods in Political Science Text Classification: Transfer Learning Language Models for Politics

    Working Paper, October 2020

    View Article View abstract

    Automated text classification has rapidly become an important tool for political analysis. Recent advancements in NLP enabled by advances in deep learning now achieve state of the art results in many standard tasks for the field. However, these methods require large amounts of both computing power and text data to learn the characteristics of the language, resources which are not always accessible to political scientists. One solution is a transfer learning approach, where knowledge learned in one area or source task is transferred to another area or a target task. A class of models that embody this approach are language models, which demonstrate extremely high levels of performance. We investigate the performance of these models in the political science by comparing multiple text classification methods. We find RoBERTa and XLNet, language models that rely on theTransformer, require fewer computing resources and less training data to perform on par with – or outperform – several political science text classification methods. Moreover, we find that the increase in accuracy is especially significant in the case of small labeled data, highlighting the potential for reducing the data-labeling cost of supervised methods for political scientists via the use of pretrained language models.

    Area of Study

    Date Posted

    Oct 20, 2020

  • Working Paper

    Opinion Change and Learning in the 2016 U.S. Presidential Election: Evidence from a Panel Survey Combined with Direct Observation of Social Media Activity

    Working Paper, September 2020

    View Article View abstract

    The role of the media in influencing people’s attitudes and opinions is difficult to demonstrate because media consumption by survey respondents is usually unobserved in datasets containing information on attitudes and vote choice. This paper leverages behavioral data combined with responses from a multi-wave panel to test whether Democrats who see more stories from liberal news sources on Twitter develop more liberal positions over time and, conversely, whether Republicans are more likely to revise their views in a conservative direction if they are exposed to more news on Twitter from conservative media sources. We find evidence that exposure to ideologically framed information and arguments changes voters’ own positions, but has a limited impact on perceptions of where the candidates stand on the issues.

    Date Posted

    Sep 24, 2020

  • Book
  • Journal Article

    Content-Based Features Predict Social Media Influence Operations

    Science Advances, 2020

    View Article View abstract

    We study how easy it is to distinguish influence operations from organic social media activity by assessing the performance of a platform-agnostic machine learning approach. Our method uses public activity to detect content that is part of coordinated influence operations based on human-interpretable features derived solely from content. We test this method on publicly available Twitter data on Chinese, Russian, and Venezuelan troll activity targeting the United States, as well as the Reddit dataset of Russian influence efforts. To assess how well content-based features distinguish these influence operations from random samples of general and political American users, we train and test classifiers on a monthly basis for each campaign across five prediction tasks. Content-based features perform well across period, country, platform, and prediction task. Industrialized production of influence campaign content leaves a distinctive signal in user-generated content that allows tracking of campaigns from month to month and across different accounts.

    Date Posted

    Jul 22, 2020

  • Journal Article

    Cross-Platform State Propaganda: Russian Trolls on Twitter and YouTube During the 2016 U.S. Presidential Election

    The International Journal of Press/Politics, 2020

    View Article View abstract

    This paper investigates online propaganda strategies of the Internet Research Agency (IRA)—Russian “trolls”—during the 2016 U.S. presidential election. We assess claims that the IRA sought either to (1) support Donald Trump or (2) sow discord among the U.S. public by analyzing hyperlinks contained in 108,781 IRA tweets. Our results show that although IRA accounts promoted links to both sides of the ideological spectrum, “conservative” trolls were more active than “liberal” ones. The IRA also shared content across social media platforms, particularly YouTube—the second-most linked destination among IRA tweets. Although overall news content shared by trolls leaned moderate to conservative, we find troll accounts on both sides of the ideological spectrum, and these accounts maintain their political alignment. Links to YouTube videos were decidedly conservative, however. While mixed, this evidence is consistent with the IRA’s supporting the Republican campaign, but the IRA’s strategy was multifaceted, with an ideological division of labor among accounts. We contextualize these results as consistent with a pre-propaganda strategy. This work demonstrates the need to view political communication in the context of the broader media ecology, as governments exploit the interconnected information ecosystem to pursue covert propaganda strategies.

    Date Posted

    Jul 01, 2020

  • Journal Article

    Automated Text Classification of News Articles: A Practical Guide

    Political Analysis, 2021

    View Article View abstract

    Automated text analysis methods have made possible the classification of large corpora of text by measures such as topic and tone. Here, we provide a guide to help researchers navigate the consequential decisions they need to make before any measure can be produced from the text. We consider, both theoretically and empirically, the effects of such choices using as a running example efforts to measure the tone of New York Times coverage of the economy. We show that two reasonable approaches to corpus selection yield radically different corpora and we advocate for the use of keyword searches rather than predefined subject categories provided by news archives. We demonstrate the benefits of coding using article segments instead of sentences as units of analysis. We show that, given a fixed number of codings, it is better to increase the number of unique documents coded rather than the number of coders for each document. Finally, we find that supervised machine learning algorithms outperform dictionaries on a number of criteria. Overall, we intend this guide to serve as a reminder to analysts that thoughtfulness and human validation are key to text-as-data methods, particularly in an age when it is all too easy to computationally classify texts without attending to the methodological choices therein.

    Area of Study

    Date Posted

    Jun 09, 2020

  • Journal Article

    Using Social and Behavioral Science to Support COVID-19 Pandemic Response

    • Jay J. Van Bavel
    • Katherine Baicker, 
    • Paulo Boggio, 
    • Valerio Capraro, 
    • Aleksandra Cichocka, 
    • Mina Cikara, 
    • Molly J. Crockett, 
    • Alia Crum, 
    • Karen M. Douglas, 
    • James N. Druckman, 
    • John Drury, 
    • Oeindrila Dube, 
    • Naomi Ellemers, 
    • Eli J. Finkel, 
    • James H. Fowler, 
    • Michele Gelfand, 
    • Shihui Han, 
    • S. Alexander Haslam, 
    • Jolanda Jetten, 
    • Shinobu Kitayama, 
    • Dean Mobbs, 
    • Lucy Napper, 
    • Dominic Packer, 
    • Gordon Pennycook, 
    • Ellen Peters, 
    • Richard E. Petty, 
    • David G. Rand, 
    • Stephen D. Reicher, 
    • Simone Schnall, 
    • Azim Shariff, 
    • Linda Skitka, 
    • Sandra Susan Smith, 
    • Cass R. Sunstein, 
    • Nassim Tabri, 
    • Joshua A. Tucker
    • Sander van der Linden, 
    • Paul A. M. Van Lange, 
    • Kim A. Weeden, 
    • Michael J. A. Wohl, 
    • Jamil Zaki, 
    • Sean R. Zion, 
    • Robb Willer

    Nature Human Behavior, 2020

    View Article View abstract

    The COVID-19 pandemic represents a massive global health crisis. Because the crisis requires large-scale behaviour change and places significant psychological burdens on individuals, insights from the social and behavioural sciences can be used to help align human behaviour with the recommendations of epidemiologists and public health experts. Here we discuss evidence from a selection of research topics relevant to pandemics, including work on navigating threats, social and cultural influences on behaviour, science communication, moral decision-making, leadership, and stress and coping. In each section, we note the nature and quality of prior research, including uncertainty and unsettled issues. We identify several insights for effective response to the COVID-19 pandemic and highlight important gaps researchers should move quickly to fill in the coming weeks and months.

    Date Posted

    Apr 30, 2020

    Tags

  • Journal Article

    The (Null) Effects of Clickbait Headlines on Polarization, Trust, and Learning

    Public Opinion Quarterly, 2020

    View Article View abstract

    “Clickbait” headlines designed to entice people to click are frequently used by both legitimate and less-than-legitimate news sources. Contemporary clickbait headlines tend to use emotional partisan appeals, raising concerns about their impact on consumers of online news. This article reports the results of a pair of experiments with different sets of subject pools: one conducted using Facebook ads that explicitly target people with a high preference for clickbait, the other using a sample recruited from Amazon’s Mechanical Turk. We estimate subjects’ individual-level preference for clickbait, and randomly assign sets of subjects to read either clickbait or traditional headlines. Findings show that older people and non-Democrats have a higher “preference for clickbait,” but reading clickbait headlines does not drive affective polarization, information retention, or trust in media.

    Area of Study

    Date Posted

    Apr 30, 2020

  • Journal Article

    Political Psycholinguistics: A Comprehensive Analysis of the Language Habits of Liberal and Conservative Social Media Users.

    Journal of Personality and Social Psychology, 2020

    View Article View abstract

    For nearly a century social scientists have sought to understand left–right ideological differences in values, motives, and thinking styles. Much progress has been made, but — as in other areas of research — this work has been criticized for relying on small and statistically unrepresentative samples and the use of reactive, self-report measures that lack ecological validity. In an effort to overcome these limitations, we employed automated text analytic methods to investigate the spontaneous, naturally occurring use of language in nearly 25,000 Twitter users. We derived 27 hypotheses from the literature on political psychology and tested them using 32 individual dictionaries. In 23 cases, we observed significant differences in the linguistic styles of liberals and conservatives. For instance, liberals used more language that conveyed benevolence, whereas conservatives used more language pertaining to threat, power, tradition, resistance to change, certainty, security, anger, anxiety, and negative emotion in general. In 17 cases, there were also significant effects of ideological extremity. For instance, moderates used more benevolent language, whereas extremists used more language pertaining to inhibition, tentativeness, affiliation, resistance to change, certainty, security, anger, anxiety, negative affect, swear words, and death-related language. These research methods, which are easily adaptable, open up new and unprecedented opportunities for conducting unobtrusive research in psycholinguistics and political psychology with large and diverse samples.

    Date Posted

    Jan 09, 2020

  • Journal Article

    Don’t Republicans Tweet Too? Using Twitter to Assess the Consequences of Political Endorsements by Celebrities

    Perspectives on Politics, 2020

    View Article View abstract

    Michael Jordan supposedly justified his decision to stay out of politics by noting that Republicans buy sneakers too. In the social media era, the name of the game for celebrities is engagement with fans. So why then do celebrities risk talking about politics on social media, which is likely to antagonize a portion of their fan base? With this question in mind, we analyze approximately 220,000 tweets from 83 celebrities who chose to endorse a presidential candidate in the 2016 U.S. presidential election campaign to assess whether there is a cost — defined in terms of engagement on Twitter — for celebrities who discuss presidential candidates. We also examine whether celebrities behave similarly to other campaign surrogates in being more likely to take on the “attack dog” role by going negative more often than going positive. More specifically, we document how often celebrities of distinct political preferences tweet about Donald Trump, Bernie Sanders, and Hillary Clinton, and we show that followers of opinionated celebrities do not withhold engagement when entertainers become politically mobilized and do indeed often go negative. Interestingly, in some cases political content from celebrities actually turns out to be more popular than typical lifestyle tweets.


    Date Posted

    Sep 06, 2019

  • Journal Article

    Who Leads? Who Follows? Measuring Issue Attention and Agenda Setting by Legislators and the Mass Public Using Social Media Data

    American Political Science Review, 2019

    View Article View abstract

    Are legislators responsive to the priorities of the public? Research demonstrates a strong correspondence between the issues about which the public cares and the issues addressed by politicians, but conclusive evidence about who leads whom in setting the political agenda has yet to be uncovered. We answer this question with fine-grained temporal analyses of Twitter messages by legislators and the public during the 113th U.S. Congress. After employing an unsupervised method that classifies tweets sent by legislators and citizens into topics, we use vector autoregression models to explore whose priorities more strongly predict the relationship between citizens and politicians. We find that legislators are more likely to follow, than to lead, discussion of public issues, results that hold even after controlling for the agenda-setting effects of the media. We also find, however, that legislators are more likely to be responsive to their supporters than to the general public.

    Date Posted

    Jul 12, 2019

  • Journal Article

    Social Networks and Protest Participation: Evidence from 130 Million Twitter Users

    American Journal of Political Science, 2019

    View Article View abstract

    Pinning down the role of social ties in the decision to protest has been notoriously elusive largely due to data limitations. Social media and their global use by protesters offer an unprecedented opportunity to observe real-time social ties and online behavior, though often without an attendant measure of real-world behavior. We collect data on Twitter activity during the 2015 Charlie Hebdo protest in Paris, which, unusually, record real-world protest attendance and network structure measured beyond egocentric networks. We devise a test of social theories of protest that hold that participation depends on exposure to others' intentions and network position determines exposure. Our findings are strongly consistent with these theories, showing that protesters are significantly more connected to one another via direct, indirect, triadic, and reciprocated ties than comparable nonprotesters. These results offer the first large-scale empirical support for the claim that social network structure has consequences for protest participation.

    Date Posted

    Jul 01, 2019

    Tags

  • Journal Article

    For Whom the Bot Tolls: A Neural Networks Approach to Measuring Political Orientation of Twitter Bots in Russia

    SAGE Open, 2019

    View Article View abstract

    Computational propaganda and the use of automated accounts in social media have recently become the focus of public attention, with alleged Russian government activities abroad provoking particularly widespread interest. However, even in the Russian domestic context, where anecdotal evidence of state activity online goes back almost a decade, no public systematic attempt has been made to dissect the population of Russian social media bots by their political orientation. We address this gap by developing a deep neural network classifier that separates pro-regime, anti-regime, and neutral Russian Twitter bots. Our method relies on supervised machine learning and a new large set of labeled accounts, rather than externally obtained account affiliations or orientation of elites. We also illustrate the use of our method by applying it to bots operating in Russian political Twitter from 2015 to 2017 and show that both pro- and anti-Kremlin bots had a substantial presence on Twitter.

    Date Posted

    Apr 12, 2019

    Tags

  • Journal Article

    How Many People Live in Political Bubbles on Social Media? Evidence From Linked Survey and Twitter Data

    SAGE Open, 2019

    View Article View abstract

    A major point of debate in the study of the Internet and politics is the extent to which social media platforms encourage citizens to inhabit online “bubbles” or “echo chambers,” exposed primarily to ideologically congenial political information. To investigate this question, we link a representative survey of Americans with data from respondents’ public Twitter accounts (N = 1,496). We then quantify the ideological distributions of users’ online political and media environments by merging validated estimates of user ideology with the full set of accounts followed by our survey respondents (N = 642,345) and the available tweets posted by those accounts (N ~ 1.2 billion). We study the extent to which liberals and conservatives encounter counter-attitudinal messages in two distinct ways: (a) by the accounts they follow and (b) by the tweets they receive from those accounts, either directly or indirectly (via retweets). More than a third of respondents do not follow any media sources, but among those who do, we find a substantial amount of overlap (51%) in the ideological distributions of accounts followed by users on opposite ends of the political spectrum. At the same time, however, we find asymmetries in individuals’ willingness to venture into cross-cutting spaces, with conservatives more likely to follow media and political accounts classified as left-leaning than the reverse. Finally, we argue that such choices are likely tempered by online news watching behavior.

    Area of Study

    Date Posted

    Feb 28, 2019

  • Journal Article

    Digital Dissent: An Analysis of the Motivational Contents of Tweets From an Occupy Wall Street Demonstration

    Motivation Science, 2019

    View Article View abstract

    Social scientific models of protest activity emphasize instrumental motives associated with rational self-interest and beliefs about group efficacy and symbolic motives associated with social identification and anger at perceived injustice. Ideological processes are typically neglected, despite the fact that protest movements occur in a sociopolitical context in which some people are motivated to maintain the status quo, whereas others are motivated to challenge it. To investigate the role of ideology and other social psychological processes in protest participation, we used manual and machine-learning methods to analyze the contents of 23,810 tweets sent on the day of the May Day 2012 Occupy Wall Street demonstration along with an additional 664,937 tweets (sent by 8,244 unique users) during the 2-week lead-up to the demonstration. Results revealed that social identification and liberal ideology were significant independent predictors of protest participation. The effect of social identification was mediated by the expression of collective efficacy, justice concerns, ideological themes, and positive emotion. The effect of liberalism was mediated by the expression of ideological themes, but conservatives were more likely to express ideological backlash against Occupy Wall Street than liberals were to express ideological support for the movement or demonstration. The expression of self-interest and anger was either negatively related or unrelated to protest participation. This work illustrates the promise (and challenge) of using automated methods to analyze new, ecologically valid data sources for studying protest activity and its motivational underpinnings — thereby informing strategic campaigns that employ collective action tactics. 

    Date Posted

    Feb 27, 2019

  • Journal Article

    Less Than You Think: Prevalence and Predictors of Fake News Dissemination on Facebook

    Science Advances, 2019

    View Article View abstract

    So-called “fake news” has renewed concerns about the prevalence and effects of misinformation in political campaigns. Given the potential for widespread dissemination of this material, we examine the individual-level characteristics associated with sharing false articles during the 2016 U.S. presidential campaign. To do so, we uniquely link an original survey with respondents’ sharing activity as recorded in Facebook profile data. First and foremost, we find that sharing this content was a relatively rare activity. Conservatives were more likely to share articles from fake news domains, which in 2016 were largely pro-Trump in orientation, than liberals or moderates. We also find a strong age effect, which persists after controlling for partisanship and ideology: On average, users over 65 shared nearly seven times as many articles from fake news domains as the youngest age group.

    Date Posted

    Jan 09, 2019

  • Journal Article
  • Journal Article

    How Accurate Are Survey Responses on Social Media and Politics?

    Political Communication, 2019

    View Article View abstract

    How accurate are survey-based measures of social media use, in particular about political topics? We answer this question by linking original survey data collected during the U.S. 2016 election campaign with respondents’ observed social media activity. We use supervised machine learning to classify whether these Twitter and Facebook account data are content related to politics. We then benchmark our survey measures on frequency of posting about politics and the number of political figures followed. We find that, on average, our self-reported survey measures tend to correlate with observed social media activity. At the same time, we also find a worrying amount of individual-level discrepancy and problems related to extreme outliers. Our recommendations are twofold. The first is for survey questions about social media use to provide respondents with options covering a wider range of activity, especially in the long tail. The second is for survey questions to include specific content and anchors defining what it means for a post to be “about politics.”

    Area of Study

    Date Posted

    Nov 05, 2018

  • Journal Article

    Turning the Virtual Tables: Government Strategies for Addressing Online Opposition with an Application to Russia

    Comparative Politics, 2018

    View Article View abstract

    We introduce a novel classification of strategies employed by autocrats to combat online opposition generally, and opposition on social media in particular. Our classification distinguishes both online from offline responses and censorship from engaging in opinion formation. For each of the three options — offline action, technical restrictions on access to content, and online engagement — we provide a detailed account for the evolution of Russian government strategy since 2000. To illustrate the feasibility of researching online engagement, we construct and assess tools for detecting the activity of political "bots," or algorithmically controlled accounts, on Russian political Twitter, and test these methods on a large dataset of politically relevant Twitter data from Russia gathered over a year and a half.

    Date Posted

    Apr 01, 2018

    Tags

  • Journal Article

    Elites Tweet to Get Feet Off the Streets: Measuring Regime Social Media Strategies During Protest

    Political Science Research and Methods, 2019

    View Article View abstract

    As non-democratic regimes have adapted to the proliferation of social media, they have begun actively engaging with Twitter to enhance regime resilience. Using data taken from the Twitter accounts of Venezuelan legislators during the 2014 anti-Maduro protests in Venezuela, we fit a topic model on the text of the tweets and analyze patterns in hashtag use by the two coalitions. We argue that the regime’s best strategy in the face of an existential threat like the narrative developed by La Salida and promoted on Twitter was to advance many competing narratives that addressed issues unrelated to the opposition’s criticism. Our results show that the two coalitions pursued different rhetorical strategies in keeping with our predictions about managing the conflict advanced by the protesters. This article extends the literature on social media use during protests by focusing on active engagement with social media on the part of the regime. This approach corroborates and expands on recent research on inferring regime strategies from propaganda and censorship.

    Date Posted

    Mar 21, 2018

    Tags

  • Working Paper

    Social Media, Political Polarization, and Political Disinformation: A Review of the Scientific Literature

    Hewlett Foundation, 2018

    View Article View abstract

    The following report is intended to provide an overview of the current state of the literature on the relationship between social media; political polarization; and political “disinformation,” a term used to encompass a wide range of types of information about politics found online, including “fake news,” rumors, deliberately factually incorrect information, inadvertently factually incorrect information, politically slanted information, and “hyperpartisan” news. The review of the literature is provided in six separate sections, each of which can be read individually but that cumulatively are intended to provide an overview of what is known—and unknown—about the relationship between social media, political polarization, and disinformation. The report concludes by identifying key gaps in our understanding of these phenomena and the data that are needed to address them.

    Date Posted

    Mar 19, 2018

  • Journal Article

    How Social Media Facilitates Political Protest: Information, Motivation, and Social Networks

    Advances in Political Psychology, 2018

    View Article View abstract

    It is often claimed that social media platforms such as Facebook and Twitter are profoundly shaping political participation, especially when it comes to protest behavior. Whether or not this is the case, the analysis of “Big Data” generated by social media usage offers unprecedented opportunities to observe complex, dynamic effects associated with large-scale collective action and social movements. In this article, we summarize evidence from studies of protest movements in the United States, Spain, Turkey, and Ukraine demonstrating that: (1) Social media platforms facilitate the exchange of information that is vital to the coordination of protest activities, such as news about transportation, turnout, police presence, violence, medical services, and legal support; (2) in addition, social media platforms facilitate the exchange of emotional and motivational contents in support of and opposition to protest activity, including messages emphasizing anger, social identification, group efficacy, and concerns about fairness, justice, and deprivation as well as explicitly ideological themes; and (3) structural characteristics of online social networks, which may differ as a function of political ideology, have important implications for information exposure and the success or failure of organizational efforts. Next, we issue a brief call for future research on a topic that is understudied but fundamental to appreciating the role of social media in facilitating political participation, namely friendship. In closing, we liken the situation confronted by researchers who are harvesting vast quantities of social media data to that of systems biologists in the early days of genome sequencing.

    Date Posted

    Feb 13, 2018

  • Book

    Twitter Wars: Sunni-Shia Conflict and Cooperation in the Digital Age

    Beyond Sunni and Shia: The Roots of Sectarianism in a Changing Middle East, 2018

    View Book

    Date Posted

    Feb 01, 2018

  • Journal Article

    Detecting Bots on Russian Political Twitter

    Big Data, 2017

    View Article View abstract

    Automated and semiautomated Twitter accounts, bots, have recently gained significant public attention due to their potential interference in the political realm. In this study, we develop a methodology for detecting bots on Twitter using an ensemble of classifiers and apply it to study bot activity within political discussions in the Russian Twittersphere. We focus on the interval from February 2014 to December 2015, an especially consequential period in Russian politics. Among accounts actively Tweeting about Russian politics, we find that on the majority of days, the proportion of Tweets produced by bots exceeds 50%. We reveal bot characteristics that distinguish them from humans in this corpus, and find that the software platform used for Tweeting is among the best predictors of bots. Finally, we find suggestive evidence that one prominent activity that bots were involved in on Russian political Twitter is the spread of news stories and promotion of media who produce them.

    Date Posted

    Dec 01, 2017

    Tags

  • Journal Article

    Moral Discourse in the Twitterverse: Effects of Ideology and Political Sophistication on Language Use Among U.S. Citizens and Members of Congress

    Journal of Language and Politics, 2018

    View Article View abstract

    We analyzed Twitter language to explore hypotheses derived from moral foundations theory, which suggests that liberals and conservatives prioritize different values. In Study 1, we captured 11 million tweets from nearly 25,000 U.S. residents and observed that liberals expressed fairness concerns more often than conservatives, whereas conservatives were more likely to express concerns about group loyalty, authority, and purity. Increasing political sophistication exacerbated ideological differences in authority and group loyalty. At low levels of sophistication, liberals used more harm language, but at high levels of sophistication conservatives referenced harm more often. In Study 2, we analyzed 59,000 tweets from 388 members of the U.S. Congress. Liberal legislators used more fairness- and harm-related words, whereas conservative legislators used more authority-related words. Unexpectedly, liberal legislators used more language pertaining to group loyalty and purity. Follow-up analyses suggest that liberals and conservatives in Congress use similar words to emphasize different policy priorities.