Academic Research - NYU’s Center for Social Media and Politics

CSMaP faculty, postdoctoral fellows, and students publish rigorous, peer-reviewed research in top academic journals and post working papers sharing ongoing work.

Working Paper
Web Scraping for Research: Legal, Ethical, Institutional, and Scientific Considerations
Megan A. Brown,

Andrew Gruen,

Gabe Maldoff,

Sol Messing,

Zeve Sanderson,

Michael Zimmer
Working Paper, December 2024
View Article View abstract

Scientists across disciplines often use data from the internet to conduct research, generating valuable insights about human behavior. However, as generative AI relying on massive text corpora becomes increasingly valuable, platforms have greatly restricted access to data through official channels. As a result, researchers will likely engage in more web scraping to collect data, introducing new challenges and concerns for researchers. This paper proposes a comprehensive framework for web scraping in social science research for U.S.-based researchers, examining the legal, ethical, institutional, and scientific factors that researchers should consider when scraping the web. We present an overview of the current regulatory environment impacting when and how researchers can access, collect, store, and share data via scraping. We then provide researchers with recommendations to conduct scraping in a scientifically legitimate and ethical manner. We aim to equip researchers with the relevant information to mitigate risks and maximize the impact of their research amidst this evolving data access landscape.
Area of Study

Data Science Methodology
Date Posted

Dec 19, 2024
Tags

Data Access,

United States
Journal Article
Concept-Guided Chain-of-Thought Prompting for Pairwise Comparison Scoring of Texts with Large Language Models
Patrick Y. Wu,

Jonathan Nagler,

Joshua A. Tucker,

Sol Messing
IEEE International Conference on Big Data, 2024
View Article View abstract

Existing text scoring methods require a large corpus, struggle with short texts, or require hand-labeled data. We develop a text scoring framework that leverages generative large language models (LLMs) to (1) set texts against the backdrop of information from the near-totality of the web and digitized media, and (2) effectively transform pairwise text comparisons from a reasoning problem to a pattern recognition task. Our approach, concept-guided chain-of-thought (CGCoT), utilizes a chain of researcher-designed prompts with an LLM to generate a concept-specific breakdown for each text, akin to guidance provided to human coders. We then pairwise compare breakdowns using an LLM and aggregate answers into a score using a probability model. We apply this approach to better understand speech reflecting aversion to specific political parties on Twitter, a topic that has commanded increasing interest because of its potential contributions to democratic backsliding. We achieve stronger correlations with human judgments than widely used unsupervised text scoring methods like Wordfish. In a supervised setting, besides a small pilot dataset to develop CGCoT prompts, our measures require no additional hand-labeled data and produce predictions on par with RoBERTa-Large fine-tuned on thousands of hand-labeled tweets. This project showcases the potential of combining human expertise and LLMs for scoring tasks.
Area of Study

Data Science Methodology

Political Polarization
Date Posted

Dec 15, 2024
Tags

Generative AI,

Twitter/X,

United States
Journal Article
News Sharing on Social Media: Mapping the Ideology of News Media, Politicians, and the Mass Public
Gregory Eady,

Richard Bonneau,

Joshua A. Tucker,

Jonathan Nagler
Political Analysis, 2024
View Article View abstract

This article examines the information sharing behavior of U.S. politicians and the mass public by mapping the ideological sharing space of political news on social media. As data, we use the near-universal currency of online information exchange: web links. We introduce a methodological approach and software to unify the measurement of ideology across social media platforms by using sharing data to jointly estimate the ideology of news media organizations, politicians, and the mass public. Empirically, we show that (1) politicians who share ideologically polarized content share, by far, the most political news and commentary and (2) that the less competitive elections are, the more likely politicians are to share polarized information. These results demonstrate that news and commentary shared by politicians come from a highly unrepresentative set of ideologically extreme legislators and that decreases in election pressures (e.g., by gerrymandering) may encourage polarized sharing behavior.
Area of Study

Elite & Mass Political Behavior

Media Consumption

Political Polarization

Data Science Methodology
Date Posted

Nov 19, 2024
Tags

Twitter/X,

United States
Journal Article
Measuring Receptivity to Misinformation at Scale on a Social Media Platform
Christopher K. Tokita,

Kevin Aslett,

William Godel,

Zeve Sanderson,

Joshua A. Tucker,

Jonathan Nagler,

Nathaniel Persily,

Richard Bonneau
PNAS Nexus, 2024
View Article View abstract

Measuring the impact of online misinformation is challenging. Traditional measures, such as user views or shares on social media, are incomplete because not everyone who is exposed to misinformation is equally likely to believe it. To address this issue, we developed a method that combines survey data with observational Twitter data to probabilistically estimate the number of users both exposed to and likely to believe a specific news story. As a proof of concept, we applied this method to 139 viral news articles and find that although false news reaches an audience with diverse political views, users who are both exposed and receptive to believing false news tend to have more extreme ideologies. These receptive users are also more likely to encounter misinformation earlier than those who are unlikely to believe it. This mismatch between overall user exposure and receptive user exposure underscores the limitation of relying solely on exposure or interaction data to measure the impact of misinformation, as well as the challenge of implementing effective interventions. To demonstrate how our approach can address this challenge, we then conducted data-driven simulations of common interventions used by social media platforms. We find that these interventions are only modestly effective at reducing exposure among users likely to believe misinformation, and their effectiveness quickly diminishes unless implemented soon after misinformation’s initial spread. Our paper provides a more precise estimate of misinformation’s impact by focusing on the exposure of users likely to believe it, offering insights for effective mitigation strategies on social media.
Area of Study

Data Science Methodology

Media Consumption

Online Information Environment

Political Polarization
Date Posted

Oct 08, 2024
Tags

2024 Election,

2020 Election,

Twitter/X,

United States
Journal Article
A Multi-Stakeholder Approach for Leveraging Data Portability to Support Research on the Digital Information Environment
Zeve Sanderson,

Lama Mohammed
Journal of Online Trust and Safety, 2024
View Article View abstract

In this paper, we aim to situate data portability within the evolving discussions of how to support data access for researchers studying the digital information environment. We explore how data donations, enabled by existing data access rights and data portability requirements, provide promising opportunities for supporting research on critical trust and safety topics. Evaluating other data access mechanisms that are more central to policy debates about platform transparency, we argue that data donations are a powerful additional mechanism that offer key legal, ethical, and scientific benefits. We then assess current challenges with using data donations for research and offer recommendations for various stakeholders to better align portability mechanisms with the needs of research. Taken together, we argue that although portability is often considered within a context of competition and user agency, regulators, industry actors, and researchers should understand and leverage portability’s potential impact to empower critical research on the societal impacts of digital platforms and services.
Area of Study

Data Science Methodology
Date Posted

Sep 18, 2024
Tags

Data Access
Working Paper
Survey Professionalism: New Evidence from Web Browsing Data
Bernhard Clemm von Hohenberg,

Tiago Ventura,

Jonathan Nagler,

Ericka Menchen-Trevino,

Magdalena Wojcieszak
Working Paper, August 2024
View Article View abstract

Online panels have become an important resource for research in political science, but the financial compensation involved incentivizes respondents to become “survey professionals”, which raises concerns about data quality. We provide evidence on survey professionalism using behavioral web browsing data from three U.S. samples, recruited via Lucid, YouGov, and Facebook (total n = 3,886). Survey professionalism is common but varies across samples: By our most conservative measure, we identify 1.7% of respondents on Facebook, 7.9% of respondents on YouGov, and 34.3% of respondents on Lucid as survey professionals. However, evidence that professionals lower data quality is limited: they do not systematically differ demographically or politically from non-professionals and do not respond more randomly—although they are somewhat more likely to speed, to straightline, and to take questionnaires repeatedly. While concerns are warranted, we conclude that survey professionals do not, by and large, distort inferences of research based on online panels.
Area of Study

Data Science Methodology

Public Opinion
Date Posted

Aug 30, 2024
Tags

United States
Working Paper
Reaching Across the Political Aisle: Overcoming Challenges in Using Social Media for Recruiting Politically Diverse Respondents
Maggie Macdonald,

Megan A. Brown,

Nejla Ašimović,

Rajeshwari Majumdar,

Lena Song,

Laura Huber,

Sarah Graham,

Abby Budiman,

Joshua A. Tucker,

Jonathan Nagler
Working Paper, August 2024
View Article View abstract

A challenge for public opinion surveys is achieving representativeness of respondents across demographic groups. We test the extent to which ideological alignment with a survey’s sponsor shapes differential partisan response and users’ choice of whether to participate in a research study on Facebook. While the use of Facebook advertisements for recruitment has increased in recent years and offers potential benefits, it can yield difficulties in recruiting politically representative samples. We recruit respondents for a short survey through two otherwise identical advertisements associated with either New York University (from a liberal state) or the University of Mississippi (from a conservative state). Contrary to our expectations, we don’t find an asymmetry in completion rates between self-reported Democrats and Republicans based on the survey sponsor. Nor do we find statistically significant differences in attitudes of respondents across the two survey sponsors when we control for observables.
Area of Study

Public Opinion

Data Science Methodology
Date Posted

Aug 13, 2024
Tags

Facebook,

United States
Journal Article
Estimating the Ideology of Political YouTube Videos
Angela Lai,

Megan A. Brown,

James Bisbee,

Richard Bonneau,

Joshua A. Tucker,

Jonathan Nagler
Political Analysis, 2024
View Article View abstract

We present a method for estimating the ideology of political YouTube videos. As online media increasingly influences how people engage with politics, so does the importance of quantifying the ideology of such media for research. The subfield of estimating ideology as a latent variable has often focused on traditional actors such as legislators, while more recent work has used social media data to estimate the ideology of ordinary users, political elites, and media sources. We build on this work by developing a method to estimate the ideologies of YouTube videos, an important subset of media, based on their accompanying text metadata. First, we take Reddit posts linking to YouTube videos and use correspondence analysis to place those videos in an ideological space. We then train a text-based model with those estimated ideologies as training labels, enabling us to estimate the ideologies of videos not posted on Reddit. These predicted ideologies are then validated against human labels. Finally, we demonstrate the utility of this method by applying it to the watch histories of survey respondents with self-identified ideologies to evaluate the prevalence of echo chambers on YouTube. Our approach gives video-level scores based only on supplied text metadata, is scalable, and can be easily adjusted to account for changes in the ideological climate. This method could also be generalized to estimate the ideology of other items referenced or posted on Reddit.
Area of Study

Data Science Methodology

Media Consumption
Date Posted

Feb 13, 2024
Tags

Reddit,

YouTube,

United States
Working Paper
Large Language Models Can Be Used to Estimate the Latent Positions of Politicians
Patrick Y. Wu,

Jonathan Nagler,

Joshua A. Tucker,

Sol Messing
Working Paper, September 2023
View Article View abstract

Existing approaches to estimating politicians' latent positions along specific dimensions often fail when relevant data is limited. We leverage the embedded knowledge in generative large language models (LLMs) to address this challenge and measure lawmakers' positions along specific political or policy dimensions. We prompt an instruction/dialogue-tuned LLM to pairwise compare lawmakers and then scale the resulting graph using the Bradley-Terry model. We estimate novel measures of U.S. senators' positions on liberal-conservative ideology, gun control, and abortion. Our liberal-conservative scale, used to validate LLM-driven scaling, strongly correlates with existing measures and offsets interpretive gaps, suggesting LLMs synthesize relevant data from internet and digitized media rather than memorizing existing measures. Our gun control and abortion measures -- the first of their kind -- differ from the liberal-conservative scale in face-valid ways and predict interest group ratings and legislator votes better than ideology alone. Our findings suggest LLMs hold promise for solving complex social science measurement problems.
Area of Study

Data Science Methodology

Elite & Mass Political Behavior
Date Posted

Sep 26, 2023
Tags

United States,

Generative AI
Journal Article
Measuring the Ideology of Audiences for Web Links and Domains Using Differentially Private Engagement Data
Cody L. Buntain,

Richard Bonneau,

Jonathan Nagler,

Joshua A. Tucker
Proceedings of the International AAAI Conference on Web and Social Media, 2023
View Article View abstract

This paper demonstrates the use of differentially private hyperlink-level engagement data for measuring ideologies of audiences for web domains, individual links, or aggregations thereof. We examine a simple metric for measuring this ideological position and assess the conditions under which the metric is robust to injected, privacy-preserving noise. This assessment provides insights into and constraints on the level of activity one should observe when applying this metric to privacy-protected data. Grounding this work is a massive dataset of social media engagement activity where privacy-preserving noise has been injected into the activity data, provided by Facebook and the Social Science One (SS1) consortium. Using this dataset, we validate our ideology measures by comparing to similar, published work on sharing-based, homophily- and content-oriented measures, where we show consistently high correlation (>0.87). We then apply this metric to individual links from several popular news domains and demonstrate how one can assess link-level distributions of ideological audiences. We further show this estimator is robust to selection of engagement types besides sharing, where domain-level audience-ideology assessments based on views and likes show no significant difference compared to sharing-based estimates. Estimates of partisanship, however, suggest the viewing audience is more moderate than the audiences who share and like these domains. Beyond providing thresholds on sufficient activity for measuring audience ideology and comparing three types of engagement, this analysis provides a blueprint for ensuring robustness of future work to differential privacy protections.
Area of Study

Data Science Methodology
Date Posted

Jun 02, 2023
Tags

Facebook,

United States
Journal Article
Dictionary-Assisted Supervised Contrastive Learning
Patrick Y. Wu,

Richard Bonneau,

Joshua A. Tucker,

Jonathan Nagler
Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2022
View Article View abstract

Text analysis in the social sciences often involves using specialized dictionaries to reason with abstract concepts, such as perceptions about the economy or abuse on social media. These dictionaries allow researchers to impart domain knowledge and note subtle usages of words relating to a concept(s) of interest. We introduce the dictionary-assisted supervised contrastive learning (DASCL) objective, allowing researchers to leverage specialized dictionaries when fine-tuning pretrained language models. The text is first keyword simplified: a common, fixed token replaces any word in the corpus that appears in the dictionary(ies) relevant to the concept of interest. During fine-tuning, a supervised contrastive objective draws closer the embeddings of the original and keyword-simplified texts of the same class while pushing further apart the embeddings of different classes. The keyword-simplified texts of the same class are more textually similar than their original text counterparts, which additionally draws the embeddings of the same class closer together. Combining DASCL and cross-entropy improves classification performance metrics in few-shot learning settings and social science applications compared to using cross-entropy alone and alternative contrastive and data augmentation methods.
Area of Study

Data Science Methodology
Date Posted

Oct 27, 2022
Tags

United States
Working Paper
Network Embedding Methods for Large Networks in Political Science
Megan A. Brown,

Zhanna Terechshenko,

Rachel Connolly,

Angela Lai,

Charlotte Ji,

Jonathan Nagler,

Joshua A. Tucker,

Richard Bonneau
Working Paper, November 2021
View Article View abstract

Social networks play an important role in many political science studies. With the rise of social media, these networks have grown in both size and complexity. Analysis of these large networks requires generation of feature representations that can be used in machine learning models. One way to generate these feature representations is to use network embedding methods for learning low-dimensional feature representations of nodes and edges in a network. While there is some literature comparing the advantages and shortcomings of these models, to our knowledge, there has not been any analysis on the applicability of network embedding models to classification tasks in political science. In this paper, we compare the performance of five prominent network embedding methods on prediction of ideology of Twitter users and ideology of Internet domains. We find that LINE provides the best feature representation across all 4 datasets that we use, resulting in the highest performance accuracy. Finally, we provide the guidelines for researchers on the use of these models for their own research.
Area of Study

Data Science Methodology
Date Posted

Nov 12, 2021
Tags

Twitter/X
Journal Article
Accessibility and Generalizability: Are Social Media Effects Moderated by Age or Digital Literacy?
Kevin Munger,

Ishita Gopal,

Jonathan Nagler,

Joshua A. Tucker
Research & Politics, 2021
View Article View abstract

An emerging empirical regularity suggests that older people use and respond to social media very differently than younger people. Older people are the fastest-growing population of Internet and social media users in the U.S., and this heterogeneity will soon become central to online politics. However, many important experiments in this field have been conducted on online samples that do not contain enough older people to be useful to generalize to the current population of Internet users; this issue is more pronounced for studies that are even a few years old. In this paper, we report the results of replicating two experiments involving social media (specifically, Facebook) conducted on one such sample lacking older users (Amazon’s Mechanical Turk) using a source of online subjects which does contain sufficient variation in subject age. We add a standard battery of questions designed to explicitly measure digital literacy. We find evidence of significant treatment effect heterogeneity in subject age and digital literacy in the replication of one of the two experiments. This result is an example of limitations to generalizability of research conducted on samples where selection is related to treatment effect heterogeneity; specifically, this result indicates that Mechanical Turk should not be used to recruit subjects when researchers suspect treatment effect heterogeneity in age or digital literacy, as we argue should be the case for research on digital media effects.
Area of Study

Data Science Methodology
Date Posted

Jun 09, 2021
Tags

United States
Journal Article
YouTube Recommendations and Effects on Sharing Across Online Social Platforms
Cody L. Buntain,

Richard Bonneau,

Jonathan Nagler,

Joshua A. Tucker
Proceedings of the ACM on Human-Computer Interaction, 2021
View Article View abstract

In January 2019, YouTube announced it would exclude potentially harmful content from video recommendations but allow such videos to remain on the platform. While this step intends to reduce YouTube's role in propagating such content, continued availability of these videos in other online spaces makes it unclear whether this compromise actually reduces their spread. To assess this impact, we apply interrupted time series models to measure whether different types of YouTube sharing in Twitter and Reddit changed significantly in the eight months around YouTube's announcement. We evaluate video sharing across three curated sets of potentially harmful, anti-social content: a set of conspiracy videos that have been shown to experience reduced recommendations in YouTube, a larger set of videos posted by conspiracy-oriented channels, and a set of videos posted by alternative influence network (AIN) channels. As a control, we also evaluate effects on video sharing in a dataset of videos from mainstream news channels. Results show conspiracy-labeled and AIN videos that have evidence of YouTube's de-recommendation experience a significant decreasing trend in sharing on both Twitter and Reddit. For videos from conspiracy-oriented channels, however, we see no significant effect in Twitter but find a significant increase in the level of conspiracy-channel sharing in Reddit. For mainstream news sharing, we actually see an increase in trend on both platforms, suggesting YouTube's suppressing particular content types has a targeted effect. This work finds evidence that reducing exposure to anti-social videos within YouTube, without deletion, has potential pro-social, cross-platform effects. At the same time, increases in the level of conspiracy-channel sharing raise concerns about content producers' responses to these changes, and platform transparency is needed to evaluate these effects further.
Area of Study

Data Science Methodology

Media Consumption
Date Posted

Apr 22, 2021
Tags

Reddit,

YouTube,

Twitter/X,

United States
Working Paper
A Comparison of Methods in Political Science Text Classification: Transfer Learning Language Models for Politics
Zhanna Terechshenko,

Fridolin Linder,

Vishakh Padmakumar,

Fengyuan Liu,

Jonathan Nagler,

Joshua A. Tucker,

Richard Bonneau
Working Paper, October 2020
View Article View abstract

Automated text classification has rapidly become an important tool for political analysis. Recent advancements in NLP enabled by advances in deep learning now achieve state of the art results in many standard tasks for the field. However, these methods require large amounts of both computing power and text data to learn the characteristics of the language, resources which are not always accessible to political scientists. One solution is a transfer learning approach, where knowledge learned in one area or source task is transferred to another area or a target task. A class of models that embody this approach are language models, which demonstrate extremely high levels of performance. We investigate the performance of these models in the political science by comparing multiple text classification methods. We find RoBERTa and XLNet, language models that rely on theTransformer, require fewer computing resources and less training data to perform on par with – or outperform – several political science text classification methods. Moreover, we find that the increase in accuracy is especially significant in the case of small labeled data, highlighting the potential for reducing the data-labeling cost of supervised methods for political scientists via the use of pretrained language models.
Area of Study

Data Science Methodology
Date Posted

Oct 20, 2020
Book
Social Media and Democracy: The State of the Field, Prospects for Reform
Nathaniel Persily,

Joshua A. Tucker
Cambridge University Press, 2020
View Book
Area of Study

Politics of Authoritarianism

Data Science Methodology

Elite & Mass Political Behavior

Foreign Influence Campaigns

Media Consumption

Online Information Environment

Political Polarization

Public Opinion
Date Posted

Aug 01, 2020
Tags

Data Access,

Facebook,

Twitter/X,

YouTube,

2012 Election,

2016 Election,

2020 Election
Journal Article
Content-Based Features Predict Social Media Influence Operations
Meysam Alizadeh,

Jacob N. Shapiro,

Cody L. Buntain,

Joshua A. Tucker
Science Advances, 2020
View Article View abstract

We study how easy it is to distinguish influence operations from organic social media activity by assessing the performance of a platform-agnostic machine learning approach. Our method uses public activity to detect content that is part of coordinated influence operations based on human-interpretable features derived solely from content. We test this method on publicly available Twitter data on Chinese, Russian, and Venezuelan troll activity targeting the United States, as well as the Reddit dataset of Russian influence efforts. To assess how well content-based features distinguish these influence operations from random samples of general and political American users, we train and test classifiers on a monthly basis for each campaign across five prediction tasks. Content-based features perform well across period, country, platform, and prediction task. Industrialized production of influence campaign content leaves a distinctive signal in user-generated content that allows tracking of campaigns from month to month and across different accounts.
Area of Study

Data Science Methodology

Foreign Influence Campaigns
Date Posted

Jul 22, 2020
Tags

Twitter/X,

Reddit,

China,

Russia,

Venezuela,

United States
Journal Article
Automated Text Classification of News Articles: A Practical Guide
Pablo Barberá,

Amber E. Boydstun,

Suzanna Linn,

Ryan McMahon,

Jonathan Nagler
Political Analysis, 2021
View Article View abstract

Automated text analysis methods have made possible the classification of large corpora of text by measures such as topic and tone. Here, we provide a guide to help researchers navigate the consequential decisions they need to make before any measure can be produced from the text. We consider, both theoretically and empirically, the effects of such choices using as a running example efforts to measure the tone of New York Times coverage of the economy. We show that two reasonable approaches to corpus selection yield radically different corpora and we advocate for the use of keyword searches rather than predefined subject categories provided by news archives. We demonstrate the benefits of coding using article segments instead of sentences as units of analysis. We show that, given a fixed number of codings, it is better to increase the number of unique documents coded rather than the number of coders for each document. Finally, we find that supervised machine learning algorithms outperform dictionaries on a number of criteria. Overall, we intend this guide to serve as a reminder to analysts that thoughtfulness and human validation are key to text-as-data methods, particularly in an age when it is all too easy to computationally classify texts without attending to the methodological choices therein.
Area of Study

Data Science Methodology
Date Posted

Jun 09, 2020
Journal Article
For Whom the Bot Tolls: A Neural Networks Approach to Measuring Political Orientation of Twitter Bots in Russia
Denis Stukal,

Sergey Sanovich,

Joshua A. Tucker,

Richard Bonneau
SAGE Open, 2019
View Article View abstract

Computational propaganda and the use of automated accounts in social media have recently become the focus of public attention, with alleged Russian government activities abroad provoking particularly widespread interest. However, even in the Russian domestic context, where anecdotal evidence of state activity online goes back almost a decade, no public systematic attempt has been made to dissect the population of Russian social media bots by their political orientation. We address this gap by developing a deep neural network classifier that separates pro-regime, anti-regime, and neutral Russian Twitter bots. Our method relies on supervised machine learning and a new large set of labeled accounts, rather than externally obtained account affiliations or orientation of elites. We also illustrate the use of our method by applying it to bots operating in Russian political Twitter from 2015 to 2017 and show that both pro- and anti-Kremlin bots had a substantial presence on Twitter.
Area of Study

Data Science Methodology

Politics of Authoritarianism
Date Posted

Apr 12, 2019
Tags

Twitter/X,

Russia
Journal Article
The Use of Twitter Bots in Russian Political Communication Online
Denis Stukal,

Joshua A. Tucker,

Sergey Sanovich,

Richard Bonneau
PONARS Eurasia Policy Memo No. 564, 2019
View Article
Area of Study

Politics of Authoritarianism

Data Science Methodology
Date Posted

Jan 01, 2019
Tags

Twitter/X,

Russia
Journal Article
Detecting Bots on Russian Political Twitter
Denis Stukal,

Sergey Sanovich,

Richard Bonneau,

Joshua A. Tucker
Big Data, 2017
View Article View abstract

Automated and semiautomated Twitter accounts, bots, have recently gained significant public attention due to their potential interference in the political realm. In this study, we develop a methodology for detecting bots on Twitter using an ensemble of classifiers and apply it to study bot activity within political discussions in the Russian Twittersphere. We focus on the interval from February 2014 to December 2015, an especially consequential period in Russian politics. Among accounts actively Tweeting about Russian politics, we find that on the majority of days, the proportion of Tweets produced by bots exceeds 50%. We reveal bot characteristics that distinguish them from humans in this corpus, and find that the software platform used for Tweeting is among the best predictors of bots. Finally, we find suggestive evidence that one prominent activity that bots were involved in on Russian political Twitter is the spread of news stories and promotion of media who produce them.
Area of Study

Data Science Methodology

Politics of Authoritarianism
Date Posted

Dec 01, 2017
Tags

Twitter/X,

Russia
Book
Measuring Public Opinion with Social Media Data
Marko Klašnja,

Pablo Barberá,

Nick Beauchamp,

Jonathan Nagler,

Joshua A. Tucker
The Oxford Handbook of Polling and Survey Methods, 2018
View Book View abstract

This chapter examines the use of social networking sites such as Twitter in measuring public opinion. It first considers the opportunities and challenges that are involved in conducting public opinion surveys using social media data. Three challenges are discussed: identifying political opinion, representativeness of social media users, and aggregating from individual responses to public opinion. The chapter outlines some of the strategies for overcoming these challenges and proceeds by highlighting some of the novel uses for social media that have fewer direct analogs in traditional survey work. Finally, it suggests new directions for a research agenda in using social media for public opinion work.
Area of Study

Public Opinion

Data Science Methodology
Date Posted

Oct 01, 2017
Tags

Twitter/X,

United States

Search or Filter

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags

Area of Study

Date Posted

Tags