Research

CSMaP is a leading academic research institute studying the ever-shifting online environment at scale. We publish peer-reviewed research in top academic journals, produce rigorous reports and analyses on policy relevant topics, and develop open source tools and methods to support the broader scholarly community.

Academic Research

  • Book

    Online Data and the Insurrection

    Media and January 6th, 2024

    View Book View abstract

    Online data is key to understanding the leadup to the January 6 insurrection, including how and why election fraud conspiracies spread online, how conspiracy groups organized online to participate in the insurrection, and other factors of online life that led to the insurrection. However, there are significant challenges in accessing data for this research. First, platforms restrict which researchers get access to data, as well as what researchers can do with the data they access. Second, this data is ephemeral; that is, once users or the platform remove the data, researchers can no longer access it. These factors affect what research questions can ever be asked and answered.

  • Journal Article

    Estimating the Ideology of Political YouTube Videos

    Political Analysis, 2024

    View Article View abstract

    We present a method for estimating the ideology of political YouTube videos. As online media increasingly influences how people engage with politics, so does the importance of quantifying the ideology of such media for research. The subfield of estimating ideology as a latent variable has often focused on traditional actors such as legislators, while more recent work has used social media data to estimate the ideology of ordinary users, political elites, and media sources. We build on this work by developing a method to estimate the ideologies of YouTube videos, an important subset of media, based on their accompanying text metadata. First, we take Reddit posts linking to YouTube videos and use correspondence analysis to place those videos in an ideological space. We then train a text-based model with those estimated ideologies as training labels, enabling us to estimate the ideologies of videos not posted on Reddit. These predicted ideologies are then validated against human labels. Finally, we demonstrate the utility of this method by applying it to the watch histories of survey respondents with self-identified ideologies to evaluate the prevalence of echo chambers on YouTube. Our approach gives video-level scores based only on supplied text metadata, is scalable, and can be easily adjusted to account for changes in the ideological climate. This method could also be generalized to estimate the ideology of other items referenced or posted on Reddit.

    Date Posted

    Feb 13, 2024

View all Academic Research

Reports & Analysis

View all Reports & Analysis

Data Collections & Tools

As part of our project to construct comprehensive data sets and to empirically test hypotheses related to social media and politics, we have developed a suite of open-source tools and modeling processes.