Beyond Competition: Designing Data Portability to Support Research on the Digital Information Environment

February 26, 2024  ·   Policy

Although portability is often considered through a competition lens, policymakers and companies should understand its potential impact on policy-relevant research efforts and ensure that portability can support research on the impacts of digital platforms and services.

An illustration of two mobile devices exchanging data.

Credit: Adobe Stock / WEBTECHOPS

View a full version of this paper, which was presented at the Data Transfer Summit hosted by the Data Transfer Initiative.

Introduction

A key concern for policymakers, journalists, civil society organizations, and academics alike is understanding the myriad impacts of digital platforms, which have come to play a central role in social interactions, economic activities, and the dissemination of information. However, a recurring challenge has been that the digital trace data necessary to produce rigorous evidence on platform effects are stored in proprietary databases, often accessible only to the platforms themselves and used for commercial applications. This dynamic enables platforms to act as gatekeepers for both academic research agendas and evidence-based policy evaluations, leaving key questions of societal import unanswered and unanswerable given a lack of data. Alarmingly, several platforms—such as Facebook, Twitter, and Reddit—have shut down public application programming interfaces (APIs) in recent years, erecting significant barriers for independent researchers to collect requisite data.

Policymakers have made data access a central concern for efforts to increase platform transparency, oversight, and accountability. In the European context, the Digital Services Act (DSA), which is primarily concerned with platform transparency and user protection, includes provisions to grant access to data from very large online platforms (VLOPs) and very large search engines (VLOEs) to vetted researchers. In the United States context, the Platform Accountability and Transparency Act (PATA) has been introduced, which includes similar mechanisms for requiring independent data access. While promising, these approaches to data access have key limitations, most notably their narrow application to VLOPs and VLOEs. This limitation is especially important given recent developments in the digital information environment, such as the rise of smaller platforms that do not reach DSA or PATA usage thresholds but nonetheless have potential social or political significance (e.g., Discord, Twitch, Nextdoor). The timeline for full DSA implementation, including comprehensive data access for vetted researchers under Article 40, is not fully known; there have also been reports of rejected requests for data through DSA.

Researchers have developed a number of other mechanisms for collecting data, such as web scraping and web tracking. A key challenge for collecting data without user or platform consent is that it introduces potential legal risks for researchers and ethical risks for users. Within this context, one promising approach is data donations in which users consent to donate digital trace data for research. In addition to establishing user consent, data donations fall within legal data portability provisions, such as those in the European Union General Data Protection Regulation (GDPR) and the proposed ACCESS Act in the U.S., and thus provide legal protections for researchers engaging in research on digital platforms. However, data portability, or the right for users to transfer their data from one digital service to themselves and/or to another digital service, has generally been considered through the lens of competition. This has led to a mismatch between data portability as a mechanism to promote competition in the digital marketplace and a mechanism to collect user data to facilitate research on the digital information environment. On the one hand, policymakers and platforms have approached the design, implementation, and evaluation of data portability through the lens of competition. On the other, researchers have leveraged data portability provisions for research, but often with challenges due to this misalignment between the needs of competition and research.

In this paper, I aim to situate data portability within the evolving discussions of how to support data access for researchers. More specifically, I explore how, given changes in the digital information environment, data donations enabled by portability requirements provide promising opportunities for facilitating research that is aligned with ethical and legal frameworks. I use generative AI as a case study for how data portability can support both platform competition and transparency. I then discuss current challenges for using data donations for research and provide recommendations for better aligning portability mechanisms with research. Taken together, I argue that, although portability is often considered through a competition lens, policymakers should understand its potential impact on policy-relevant research efforts and ensure that portability can support research on the societal impacts of digital platforms and services.