DATA COLLECTIONS AND ANALYSIS TOOLS
SMaPP Open Source Software: As part of our project to construct comprehensive data sets and to empirically test hypotheses related to social media and politics, we have developed a suite of open-source tools and modeling processes. Collectively, these tools and data collections have been downloaded roughly 500,000 times.
Below, we list the tools, describe their functionality, and outline how interested parties can best use them.
Python
smappdragon is a low level data parser designed for parsing twitter data.
GitHub: https://github.com/SMAPPNYU/smappdragon
Downloads: 83,509
PyPi: https://pypi.org/project/smappdragon/
First release: 2016-02
ReadTheDocs: NA
Lead Developers: Yvan Scher, Leon Yin
pysmapp is a high-level JSON-parser that extends smapdragon for non-coders.
GitHub: https://github.com/SMAPPNYU/pysmap
Downloads: 67,747
PyPi: https://pypi.org/project/pysmap/
First release: 2016-06
ReadTheDocs: NA
Lead Developers: Yvan Scher
urlExpander is a Python module to quickly and thoroughly expand shortened urls.
GitHub: https://github.com/SMAPPNYU/urlExpander
Downloads: 8,413
PyPi: https://pypi.org/project/urlExpander/
First release: 2018-07
ReadTheDocs: NA
Lead Developers: Leon Yin
youtube-data-api is a Python client to download public YouTube data about channels, videos, and searches.
GitHub: https://github.com/SMAPPNYU/youtube-data-api
Downloads: 7,012
PyPi: https://pypi.org/project/youtube-data-api/
First release: 2018-09
ReadTheDocs: https://youtube-data-api.readthedocs.io
Lead Developers: Leon Yin, Megan Brown
R
streamR is a package written by Pablo Barbera and published on CRAN that provides access to Twitter’s Streaming API via the R language. The package allows for easy creation of filter, sample and user streams: this involves tooling for creation of an OAuth token, opening a filter connection with twitter that returns real time public statuses based on given terms or parameters, for a given user or users, or a random sample of these statuses (up to 1% of all tweets at any given time, which is the Twitter limit) and functions for reading and parsing the results into a data frame.
GitHub: https://github.com/pablobarbera/streamR
Downloads: 102,000
Downloads: https://cranlogs.r-pkg.org/badges/grand-total/streamR
CRAN: https://www.rdocumentation.org/packages/streamR/versions/0.4.4
First release: 2018-07
ReadTheDocs: NA
Lead Developers: Pablo Barbera
The Rfacebook package was written by Pablo Barbera, Michael Piccirilli, Andrew Geisler, and Wouter van Attendeldt and published on CRAN. It allows for easy interaction with the Facebook via R and provides many different functions for interacting with the API to obtain information about users, posts and to collect public status updates the mention specific keywords. Some examples of the tooling provided include: functions to create an OAuth token, get posts, likes, reactions, shares, users, getting page insights for a specific time period, getting an adjacency matrix representing a users mutual friend network, general facebook search, facebook group search. The package is built out and includes unit testing.
GitHub: https://github.com/pablobarbera/Rfacebook
Downloads: 133,000
Downloads: https://cranlogs.r-pkg.org/badges/grand-total/Rfacebook
CRAN: https://cran.r-project.org/web/packages/Rfacebook/
First release: 2017-05
ReadTheDocs: https://cran.r-project.org/web/packages/Rfacebook/Rfacebook.pdf
Lead Developers: Pablo Barbera, Michael Piccirilli, Andrew Geisler, Wouter van Atteveldt
instaR package provides a series of functions that allow R users to access Instagram’s API to search for media that mention specific hashtags or were sent from a certain location, to download recent pictures from a given users, and to capture lists of followers.
GitHub: https://github.com/pablobarbera/instaR
Downloads: 24,000
Downloads: https://cranlogs.r-pkg.org/badges/grand-total/instaR
CRAN: https://cran.r-project.org/web/packages/instaR/index.html
First release: 2016-08
ReadTheDocs: NA
Lead Developers: Pablo Barbera, Tiago Dantas, Jonne Guyt
Data Sets / Collectors
A directory of local news outlets by state and medium. Includes social media accounts for local news outlets.
GitHub: https://github.com/yinleon/LocalNewsDataset
First release: 2018-08