DATA COLLECTIONS AND ANALYSIS TOOLS
SMaPP Open Source Software: As part of our project to construct comprehensive data sets and to empirically test hypotheses related to social media and politics, we have developed a suite of open-source tools and modeling processes. Collectively, these tools and data collections have been downloaded roughly 500,000 times.
Below, we list the tools, describe their functionality, and outline how interested parties can best use them.
pysmapp is a high-level JSON-parser that extends smapdragon for non-coders.
smaberta is a wrapper for the huggingface transformer libraries.
First release: 2020-10
Lead Developers: Vishakh Padmakumar, Zhanna Terechshenko
smappdragon is a low level data parser designed for parsing twitter data.
smapp_text_classifier contains some helpful abstractions to allow for easy training of state of the art supervised text classification models. It removes some repetitive and time intensive tasks from like setting up cross validation over various ML classifiers, transformation of text into various feature spaces (word n-grams, character n-grams, and word embeddings), and setting of reasonable defaults for tuning parameters.
First release: 2019-08
Lead Developers: Fridolin Linder, Fengyuan Michael Liu
urlExpander is a Python module to quickly and thoroughly expand shortened urls.
youtube-data-api is a Python client to download public YouTube data about channels, videos, and searches.
streamR is a package written by Pablo Barbera and published on CRAN that provides access to Twitter’s Streaming API via the R language. The package allows for easy creation of filter, sample and user streams: this involves tooling for creation of an OAuth token, opening a filter connection with twitter that returns real time public statuses based on given terms or parameters, for a given user or users, or a random sample of these statuses (up to 1% of all tweets at any given time, which is the Twitter limit) and functions for reading and parsing the results into a data frame.
First release: 2018-07
Lead Developers: Pablo Barbera
The Rfacebook package was written by Pablo Barbera, Michael Piccirilli, Andrew Geisler, and Wouter van Attendeldt and published on CRAN. It allows for easy interaction with the Facebook via R and provides many different functions for interacting with the API to obtain information about users, posts and to collect public status updates the mention specific keywords. Some examples of the tooling provided include: functions to create an OAuth token, get posts, likes, reactions, shares, users, getting page insights for a specific time period, getting an adjacency matrix representing a users mutual friend network, general facebook search, facebook group search. The package is built out and includes unit testing.
First release: 2017-05
Lead Developers: Pablo Barbera, Michael Piccirilli, Andrew Geisler, Wouter van Atteveldt
instaR package provides a series of functions that allow R users to access Instagram’s API to search for media that mention specific hashtags or were sent from a certain location, to download recent pictures from a given users, and to capture lists of followers.
First release: 2016-08
Lead Developers: Pablo Barbera, Tiago Dantas, Jonne Guyt
Data Sets / Collectors
A directory of local news outlets by state and medium. Includes social media accounts for local news outlets.
First release: 2018-08
Dataset of public interest interventions on Twitter for politicians and candidates during the 2020 US General Elections.
First release: 2020-11
Lead Developer: Megan Brown