United States
Academic Research
-
Working Paper
Web Scraping for Research: Legal, Ethical, Institutional, and Scientific Considerations
Working Paper, December 2024
Scientists across disciplines often use data from the internet to conduct research, generating valuable insights about human behavior. However, as generative AI relying on massive text corpora becomes increasingly valuable, platforms have greatly restricted access to data through official channels. As a result, researchers will likely engage in more web scraping to collect data, introducing new challenges and concerns for researchers. This paper proposes a comprehensive framework for web scraping in social science research for U.S.-based researchers, examining the legal, ethical, institutional, and scientific factors that researchers should consider when scraping the web. We present an overview of the current regulatory environment impacting when and how researchers can access, collect, store, and share data via scraping. We then provide researchers with recommendations to conduct scraping in a scientifically legitimate and ethical manner. We aim to equip researchers with the relevant information to mitigate risks and maximize the impact of their research amidst this evolving data access landscape.
-
Journal Article
The Diffusion and Reach of (Mis)Information on Facebook During the U.S. 2020 Election
Sociological Science, 2024
Social media creates the possibility for rapid, viral spread of content, but how many posts actually reach millions? And is misinformation special in how it propagates? We answer these questions by analyzing the virality of and exposure to information on Facebook during the U.S. 2020 presidential election. We examine the diffusion trees of the approximately 1 B posts that were re-shared at least once by U.S.-based adults from July 1, 2020, to February 1, 2021. We differentiate misinformation from non-misinformation posts to show that (1) misinformation diffused more slowly, relying on a small number of active users that spread misinformation via long chains of peer-to-peer diffusion that reached millions; non-misinformation spread primarily through one-to-many affordances (mainly, Pages); (2) the relative importance of peer-to-peer spread for misinformation was likely due to an enforcement gap in content moderation policies designed to target mostly Pages and Groups; and (3) periods of aggressive content moderation proximate to the election coincide with dramatic drops in the spread and reach of misinformation and (to a lesser extent) political content.
Reports & Analysis
-
Analysis
Who Has a Policy that Would Benefit You? More Voters Say Trump.
National survey data from the 2016, 2020, and 2024 elections shed light on how candidates' campaign strategies impact voter policy recall.
November 2, 2024
-
Analysis
How Americans’ Confidence in Technology Firms has Dropped
Results from the American Institutional Confidence poll's second wave show that the public's confidence in technology, and tech companies, has markedly decreased over the past five years.
June 14, 2023
News & Commentary
-
News
2024 Year in Review: Our Research & Impact
A look at our top articles, events, and more from the past year.
December 18, 2024
-
Policy
The Case for Open Data Access to Aid Tech Regulation
To really understand the potential risks and harms of social media, platforms and policymakers need to ensure accessible pathways for empirical research.
December 17, 2024