Academic Research
CSMaP faculty, postdoctoral fellows, and students publish rigorous, peer-reviewed research in top academic journals and post working papers sharing ongoing work.
Search or Filter
-
Working Paper
A Comparison of Methods in Political Science Text Classification: Transfer Learning Language Models for Politics
Working Paper, October 2020
Automated text classification has rapidly become an important tool for political analysis. Recent advancements in NLP enabled by advances in deep learning now achieve state of the art results in many standard tasks for the field. However, these methods require large amounts of both computing power and text data to learn the characteristics of the language, resources which are not always accessible to political scientists. One solution is a transfer learning approach, where knowledge learned in one area or source task is transferred to another area or a target task. A class of models that embody this approach are language models, which demonstrate extremely high levels of performance. We investigate the performance of these models in the political science by comparing multiple text classification methods. We find RoBERTa and XLNet, language models that rely on theTransformer, require fewer computing resources and less training data to perform on par with – or outperform – several political science text classification methods. Moreover, we find that the increase in accuracy is especially significant in the case of small labeled data, highlighting the potential for reducing the data-labeling cost of supervised methods for political scientists via the use of pretrained language models.
-
Working Paper
-
Book
Social Media and Democracy: The State of the Field, Prospects for Reform
Cambridge University Press, 2020
-
Journal Article
Content-Based Features Predict Social Media Influence Operations
Science Advances, 2020
-
Journal Article
Cross-Platform State Propaganda: Russian Trolls on Twitter and YouTube During the 2016 U.S. Presidential Election
The International Journal of Press/Politics, 2020
This paper investigates online propaganda strategies of the Internet Research Agency (IRA)—Russian “trolls”—during the 2016 U.S. presidential election. We assess claims that the IRA sought either to (1) support Donald Trump or (2) sow discord among the U.S. public by analyzing hyperlinks contained in 108,781 IRA tweets. Our results show that although IRA accounts promoted links to both sides of the ideological spectrum, “conservative” trolls were more active than “liberal” ones. The IRA also shared content across social media platforms, particularly YouTube—the second-most linked destination among IRA tweets. Although overall news content shared by trolls leaned moderate to conservative, we find troll accounts on both sides of the ideological spectrum, and these accounts maintain their political alignment. Links to YouTube videos were decidedly conservative, however. While mixed, this evidence is consistent with the IRA’s supporting the Republican campaign, but the IRA’s strategy was multifaceted, with an ideological division of labor among accounts. We contextualize these results as consistent with a pre-propaganda strategy. This work demonstrates the need to view political communication in the context of the broader media ecology, as governments exploit the interconnected information ecosystem to pursue covert propaganda strategies.
-
Journal Article
Automated Text Classification of News Articles: A Practical Guide
Political Analysis, 2021
Automated text analysis methods have made possible the classification of large corpora of text by measures such as topic and tone. Here, we provide a guide to help researchers navigate the consequential decisions they need to make before any measure can be produced from the text. We consider, both theoretically and empirically, the effects of such choices using as a running example efforts to measure the tone of New York Times coverage of the economy. We show that two reasonable approaches to corpus selection yield radically different corpora and we advocate for the use of keyword searches rather than predefined subject categories provided by news archives. We demonstrate the benefits of coding using article segments instead of sentences as units of analysis. We show that, given a fixed number of codings, it is better to increase the number of unique documents coded rather than the number of coders for each document. Finally, we find that supervised machine learning algorithms outperform dictionaries on a number of criteria. Overall, we intend this guide to serve as a reminder to analysts that thoughtfulness and human validation are key to text-as-data methods, particularly in an age when it is all too easy to computationally classify texts without attending to the methodological choices therein.
-
Journal Article
Using Social and Behavioral Science to Support COVID-19 Pandemic Response
Nature Human Behavior, 2020
-
Journal Article
The (Null) Effects of Clickbait Headlines on Polarization, Trust, and Learning
Public Opinion Quarterly, 2020
“Clickbait” headlines designed to entice people to click are frequently used by both legitimate and less-than-legitimate news sources. Contemporary clickbait headlines tend to use emotional partisan appeals, raising concerns about their impact on consumers of online news. This article reports the results of a pair of experiments with different sets of subject pools: one conducted using Facebook ads that explicitly target people with a high preference for clickbait, the other using a sample recruited from Amazon’s Mechanical Turk. We estimate subjects’ individual-level preference for clickbait, and randomly assign sets of subjects to read either clickbait or traditional headlines. Findings show that older people and non-Democrats have a higher “preference for clickbait,” but reading clickbait headlines does not drive affective polarization, information retention, or trust in media.
-
Journal Article
Political Psycholinguistics: A Comprehensive Analysis of the Language Habits of Liberal and Conservative Social Media Users.
Journal of Personality and Social Psychology, 2020
-
Journal Article
Don’t Republicans Tweet Too? Using Twitter to Assess the Consequences of Political Endorsements by Celebrities
Perspectives on Politics, 2020
Michael Jordan supposedly justified his decision to stay out of politics by noting that Republicans buy sneakers too. In the social media era, the name of the game for celebrities is engagement with fans. So why then do celebrities risk talking about politics on social media, which is likely to antagonize a portion of their fan base? With this question in mind, we analyze approximately 220,000 tweets from 83 celebrities who chose to endorse a presidential candidate in the 2016 U.S. presidential election campaign to assess whether there is a cost — defined in terms of engagement on Twitter — for celebrities who discuss presidential candidates. We also examine whether celebrities behave similarly to other campaign surrogates in being more likely to take on the “attack dog” role by going negative more often than going positive. More specifically, we document how often celebrities of distinct political preferences tweet about Donald Trump, Bernie Sanders, and Hillary Clinton, and we show that followers of opinionated celebrities do not withhold engagement when entertainers become politically mobilized and do indeed often go negative. Interestingly, in some cases political content from celebrities actually turns out to be more popular than typical lifestyle tweets.
-
Journal Article
Who Leads? Who Follows? Measuring Issue Attention and Agenda Setting by Legislators and the Mass Public Using Social Media Data
American Political Science Review, 2019
-
Journal Article
Social Networks and Protest Participation: Evidence from 130 Million Twitter Users
American Journal of Political Science, 2019
-
Journal Article
For Whom the Bot Tolls: A Neural Networks Approach to Measuring Political Orientation of Twitter Bots in Russia
SAGE Open, 2019
Computational propaganda and the use of automated accounts in social media have recently become the focus of public attention, with alleged Russian government activities abroad provoking particularly widespread interest. However, even in the Russian domestic context, where anecdotal evidence of state activity online goes back almost a decade, no public systematic attempt has been made to dissect the population of Russian social media bots by their political orientation. We address this gap by developing a deep neural network classifier that separates pro-regime, anti-regime, and neutral Russian Twitter bots. Our method relies on supervised machine learning and a new large set of labeled accounts, rather than externally obtained account affiliations or orientation of elites. We also illustrate the use of our method by applying it to bots operating in Russian political Twitter from 2015 to 2017 and show that both pro- and anti-Kremlin bots had a substantial presence on Twitter.
-
Journal Article
-
Journal Article
Digital Dissent: An Analysis of the Motivational Contents of Tweets From an Occupy Wall Street Demonstration
Motivation Science, 2019
Social scientific models of protest activity emphasize instrumental motives associated with rational self-interest and beliefs about group efficacy and symbolic motives associated with social identification and anger at perceived injustice. Ideological processes are typically neglected, despite the fact that protest movements occur in a sociopolitical context in which some people are motivated to maintain the status quo, whereas others are motivated to challenge it. To investigate the role of ideology and other social psychological processes in protest participation, we used manual and machine-learning methods to analyze the contents of 23,810 tweets sent on the day of the May Day 2012 Occupy Wall Street demonstration along with an additional 664,937 tweets (sent by 8,244 unique users) during the 2-week lead-up to the demonstration. Results revealed that social identification and liberal ideology were significant independent predictors of protest participation. The effect of social identification was mediated by the expression of collective efficacy, justice concerns, ideological themes, and positive emotion. The effect of liberalism was mediated by the expression of ideological themes, but conservatives were more likely to express ideological backlash against Occupy Wall Street than liberals were to express ideological support for the movement or demonstration. The expression of self-interest and anger was either negatively related or unrelated to protest participation. This work illustrates the promise (and challenge) of using automated methods to analyze new, ecologically valid data sources for studying protest activity and its motivational underpinnings — thereby informing strategic campaigns that employ collective action tactics.
-
Journal Article
Less Than You Think: Prevalence and Predictors of Fake News Dissemination on Facebook
Science Advances, 2019
So-called “fake news” has renewed concerns about the prevalence and effects of misinformation in political campaigns. Given the potential for widespread dissemination of this material, we examine the individual-level characteristics associated with sharing false articles during the 2016 U.S. presidential campaign. To do so, we uniquely link an original survey with respondents’ sharing activity as recorded in Facebook profile data. First and foremost, we find that sharing this content was a relatively rare activity. Conservatives were more likely to share articles from fake news domains, which in 2016 were largely pro-Trump in orientation, than liberals or moderates. We also find a strong age effect, which persists after controlling for partisanship and ideology: On average, users over 65 shared nearly seven times as many articles from fake news domains as the youngest age group.
-
Journal Article
The Use of Twitter Bots in Russian Political Communication Online
PONARS Eurasia Policy Memo No. 564, 2019
-
Journal Article
How Accurate Are Survey Responses on Social Media and Politics?
Political Communication, 2019
-
Journal Article
Turning the Virtual Tables: Government Strategies for Addressing Online Opposition with an Application to Russia
Comparative Politics, 2018
We introduce a novel classification of strategies employed by autocrats to combat online opposition generally, and opposition on social media in particular. Our classification distinguishes both online from offline responses and censorship from engaging in opinion formation. For each of the three options — offline action, technical restrictions on access to content, and online engagement — we provide a detailed account for the evolution of Russian government strategy since 2000. To illustrate the feasibility of researching online engagement, we construct and assess tools for detecting the activity of political "bots," or algorithmically controlled accounts, on Russian political Twitter, and test these methods on a large dataset of politically relevant Twitter data from Russia gathered over a year and a half.
-
Journal Article
Elites Tweet to Get Feet Off the Streets: Measuring Regime Social Media Strategies During Protest
Political Science Research and Methods, 2019
-
Working Paper
-
Journal Article
How Social Media Facilitates Political Protest: Information, Motivation, and Social Networks
Advances in Political Psychology, 2018
-
Book
Twitter Wars: Sunni-Shia Conflict and Cooperation in the Digital Age
Beyond Sunni and Shia: The Roots of Sectarianism in a Changing Middle East, 2018
-
Journal Article
Detecting Bots on Russian Political Twitter
Big Data, 2017
Automated and semiautomated Twitter accounts, bots, have recently gained significant public attention due to their potential interference in the political realm. In this study, we develop a methodology for detecting bots on Twitter using an ensemble of classifiers and apply it to study bot activity within political discussions in the Russian Twittersphere. We focus on the interval from February 2014 to December 2015, an especially consequential period in Russian politics. Among accounts actively Tweeting about Russian politics, we find that on the majority of days, the proportion of Tweets produced by bots exceeds 50%. We reveal bot characteristics that distinguish them from humans in this corpus, and find that the software platform used for Tweeting is among the best predictors of bots. Finally, we find suggestive evidence that one prominent activity that bots were involved in on Russian political Twitter is the spread of news stories and promotion of media who produce them.
-
Journal Article
Moral Discourse in the Twitterverse: Effects of Ideology and Political Sophistication on Language Use Among U.S. Citizens and Members of Congress
Journal of Language and Politics, 2018
We analyzed Twitter language to explore hypotheses derived from moral foundations theory, which suggests that liberals and conservatives prioritize different values. In Study 1, we captured 11 million tweets from nearly 25,000 U.S. residents and observed that liberals expressed fairness concerns more often than conservatives, whereas conservatives were more likely to express concerns about group loyalty, authority, and purity. Increasing political sophistication exacerbated ideological differences in authority and group loyalty. At low levels of sophistication, liberals used more harm language, but at high levels of sophistication conservatives referenced harm more often. In Study 2, we analyzed 59,000 tweets from 388 members of the U.S. Congress. Liberal legislators used more fairness- and harm-related words, whereas conservative legislators used more authority-related words. Unexpectedly, liberal legislators used more language pertaining to group loyalty and purity. Follow-up analyses suggest that liberals and conservatives in Congress use similar words to emphasize different policy priorities.
Pinning down the role of social ties in the decision to protest has been notoriously elusive largely due to data limitations. Social media and their global use by protesters offer an unprecedented opportunity to observe real-time social ties and online behavior, though often without an attendant measure of real-world behavior. We collect data on Twitter activity during the 2015 Charlie Hebdo protest in Paris, which, unusually, record real-world protest attendance and network structure measured beyond egocentric networks. We devise a test of social theories of protest that hold that participation depends on exposure to others' intentions and network position determines exposure. Our findings are strongly consistent with these theories, showing that protesters are significantly more connected to one another via direct, indirect, triadic, and reciprocated ties than comparable nonprotesters. These results offer the first large-scale empirical support for the claim that social network structure has consequences for protest participation.