Data Collections & Tools
As part of our project to construct comprehensive data sets and to empirically test hypotheses related to social media and politics, we have developed a suite of open-source tools and modeling processes. Collectively, these tools and data collections have been downloaded roughly 1.5 million times.
Filter
-
Python
smapp_text_classifier
smapp_text_classifier contains some helpful abstractions to allow for easy training of state of the art supervised text classification models. It removes some repetitive and time intensive tasks like setting up cross validation over various ML classifiers, transformation of text into various feature spaces (word n-grams, character n-grams, and word embeddings), and setting of reasonable defaults for tuning parameters.
-
Datasets
Twitter Public Interest Exception Interventions
Dataset of public interest interventions on Twitter for politicians and candidates during the 2020 U.S. general elections.
-
Datasets
Local News Dataset
A directory of local news outlets by state and medium. Includes social media accounts for local news outlets.
-
Python
youtube-data-api
youtube-data-api is a Python client to download public YouTube data about channels, videos, and searches.
-
Python
smappdragon
smappdragon is a low level data parser designed for parsing twitter data.
-
Python
urlExpander
urlExpander is a Python module to quickly and thoroughly expand shortened urls.
-
Python
pysmap
pysmapp is a high-level JSON-parser that extends smapdragon for non-coders.
-
Python
SMaBERTa
smaberta is a wrapper for the huggingface transformer libraries.
-
R
Rfacebook package
The Rfacebook package was written by Pablo Barberá, Michael Piccirilli, Andrew Geisler, and Wouter van Attendeldt and published on CRAN. It allows for easy interaction with the Facebook via R and provides many different functions for interacting with the API to obtain information about users, posts and to collect public status updates that mention specific keywords. Some examples of the tooling provided include: functions to create an OAuth token, get posts, likes, reactions, shares, users, getting page insights for a specific time period, getting an adjacency matrix representing a users mutual friend network, general Facebook search, Facebook group search. The package is built out and includes unit testing.
-
R
instaR
instaR package provides a series of functions that allow R users to access Instagram’s API to search for media that mention specific hashtags or were sent from a certain location, to download recent pictures from a given users, and to capture lists of followers.
-
R
streamR package
streamR is a package written by Pablo Barbera and published on CRAN that provides access to Twitter’s Streaming API via the R language. The package allows for easy creation of filter, sample and user streams: this involves tooling for creation of an OAuth token, opening a filter connection with Twitter that returns real time public statuses based on given terms or parameters, for a given user or users, or a random sample of these statuses (up to 1% of all tweets at any given time, which is the Twitter limit) and functions for reading and parsing the results into a data frame.