Mosaics of Insight: Auditing TikTok Through Independent Data Access

February 21, 2025  ·   Policy

Even if TikTok is sold to a non-Chinese buyer, the threat of foreign influence will remain. That’s why researchers need independent data access.

Person holding an iPhone running TikTok.

Credit: Solen Feyissa

This article was originally published on Lawfare.

It’s been a whirlwind few weeks for TikTok in the United States. After going dark on the evening of Jan. 18 following a recent Supreme Court ruling upholding the divestment order for the video-sharing app, TikTok quickly went back online when President Trump indicated he would sign an executive order delaying the law. Trump officially signed the order on Jan. 20, giving the administration 75 more days to pursue a resolution to save the app for American users.

Proponents of the divestment order argue that TikTok poses a threat to U.S. national security. More specifically, lawmakers fear that the Chinese government could use TikTok to access and then exploit U.S. user data and that Chinese officials could pressure TikTok to push certain types of information—such as foreign propaganda or pro-China content—into American users’ feeds. In fact, a 2019 leak to the Guardian revealed content moderation policies crafted to downrank or take down content related to anti-government protests, which the Chinese government often tries to censor.

However, even if TikTok is sold to a non-Chinese buyer, the threat of foreign influence will remain. A few months ago, the Romanian high court canceled the ongoing presidential election after a Russian-sponsored propaganda campaign on TikTok led to a surprise victory for far-right candidate Călin Georgescu. Notably, there was no evidence that China had acted on behalf of Russia. Instead, the EU is investigating potential “automated exploitation” of the algorithmic feed by Russian actors.

This case highlights key aspects of novel dynamics introduced by TikTok, a platform with a primarily algorithmic content feed. In social graph-based networks—such as Facebook and Instagram—foreign adversarial accounts need to build followers of real users in order to reach an audience, which has become increasingly difficult. But with an algorithmic feed like TikTok’s, content is served to users by interpreting signals from other users. While growing a follower base is still an important reach strategy on TikTok, the platform opens up new avenues for foreign influence. In the case of Romania, coordinated accounts and paid promotion may have influenced the recommendation algorithm, promoting Georgescu’s content to Romanian users. Or, much like Russia’s perceived influence on the 2016 U.S. presidential election, the electoral impact of a social media platform could once again be over-interpreted.

In both the U.S. court cases and the EU investigation of the Romanian election, there is a core concern: How do lawmakers or the platform’s users know if TikTok’s algorithm has been influenced, either directly by China or indirectly by another adversarial actor? Based on a large academic literature studying foreign influence campaigns on other platforms, it’s clear that the current transparency regime—algorithmic audits through Oracle and a limited application programming interface (API) for external researchers—is insufficient for identifying various direct and indirect mechanisms for influence. This risks leaving both policymakers and the public in the dark about the actual harms of the platform. Regardless of what happens in the first 75 days of Trump’s new term, independent data access for researchers will be required to fully understand the platform’s democratic effects in the U.S. and abroad.

The Shortcomings of Project Texas

For roughly half a decade, TikTok negotiated with U.S. officials to address national security concerns associated with ownership of the platform by ByteDance, its Chinese parent company. TikTok’s primary attempt to assuage these concerns was through Project TexasTikTok’s plan “to address the U.S. government’s concerns about national security risks associated with the product.” Through that initiative, TikTok created a separate U.S. subsidiary to manage all protected American user data and house the platform’s content moderation systems. Project Texas also partnered with Oracle to provide independent oversight of the platform’s recommendation algorithm and moderation protocols.

This ultimately wasn’t enough for Congress, which passed a bill last spring requiring ByteDance to sell TikTok or face a shutdown. TikTok immediately challenged the law as unconstitutional, and the case has been working its way through the courts ever since. A key question for the U.S. Court of Appeals for the D.C. Circuit, which heard arguments in September, was whether TikTok’s source code was auditable to detect potential foreign influence and other risk areas, especially given the size of the code base. But another, potentially more damning question remained: Is analysis of the source code sufficient to identify foreign influence?

Recommendation algorithms are complex and dynamic systems that leverage data from users, playing out in ways that may not be apparent from inspecting the code itself. To measure how recommendation systems actually feed content to users, it’s necessary to observe users’ feeds. 

In this regard, Project Texas fails on two fronts. First, the type of large-scale observational data needed to study foreign influence is not part of the project’s auditing mechanisms. Second, the ability to leverage and interpret such data requires expertise from across disciplines and scientific, transparent methods used in analysis. Here, independent data access is key in facilitating research that is rigorous and in the public domain, supporting evidence-based policymaking.

Indeed, external researchers have been instrumental in deepening policymakers’ and the public’s understanding of the effects of other social media platforms, uncovering foreign interference campaigns, identifying new risk areas, and even providing evidence that some assumptions of platform effects were unfounded. While TikTok does maintain a researcher API, its guidelines run counter to academic norms around data retention, publication, and replication. And recently, a new paper published in Information, Communication & Society provides evidence that the API is missing both videos and engagement data that can be found on TikTok’s platform.

So, how could TikTok enable public auditing and transparency to detect foreign influence? The case of Twitter, now X, serves as a useful case study to understand how different data sources can provide insights into platforms.

Audit Through Analogy

Before Elon Musk’s takeover of the platform, Twitter was the best understood major platform because of its robust data sharing—at large scale, in near real-time, and with low costs. To do so, the platform maintained structured APIs of public data (that is, tweets or accounts that were publicly visible) and offered access to independent researchers. These APIs allowed researchers access to a large random sample of public posts, to follower networks, and to keyword-based collections. There were well-known limitations (such as no data on exposure), but these APIs enabled research agendas on a range of topics, including foreign influence campaigns. During these years, while Twitter provided access to these APIs, the company offered no direct information about the algorithms that structured the on-platform experience.

In early 2023, two developments occurred in tandem, essentially reversing the company’s transparency regime. First, Twitter shut off access to its API endpoints and created a data access program that both provided far less data than previously accessible and introduced a prohibitive pricing structure (for what was already less data). As a result, few (if any) external researchers have access to the requisite data to study user experiences on the platform. Second, Twitter publicly released the code used for its recommendation algorithm on GitHub. Still, the algorithm provided little insight into the inner workings of the platform.

What was the net effect? With access to the algorithm but without access to large-scale platform data, analysts know far less about Twitter/X—including the presence and activity of foreign influence campaigns—than they did previously. This emphasizes that while transparency is important to identify foreign influence, the right approach to transparency is equally important.

What This Means for TikTok

Social media platforms enable a complex interplay between user behaviors, algorithmic systems, and technological affordances. No single data source can provide full insight into the harms (or benefits) introduced on or by a platform. Instead, robust independent data access enables public auditing to create mosaics of insight: each piece a necessary but insufficient part of a broad understanding of the myriad risks and benefits associated with platforms.

 It may be too late for TikTok to make meaningful transparency changes to Project Texas to allay the national security concerns surrounding the app. Indeed, most of the discussion about saving TikTok over the next few months will center on finding a suitable American buyer, which TikTok has now indicated it is open to. But as the Trump administration considers ways to save TikTok for American users—and as other governments potentially consider how to protect against American ownership of social media—mechanisms to create data access for independent researchers should be a crucial part of the discussion.

While significant attention has been (and should be) paid to how policymakers can improve transparency—such as through the Digital Services Act in the EU and the proposed Platform Accountability and Transparency Act in the U.S.—TikTok itself can make significant changes. In the short term, it can make improvements to the researcher API: extending the short data refresh periods, developing user-friendly tools for nontechnical researchers, creating streamlined processes for independently vetting researchers, and facilitating better protocols for replication. In the long term, TikTok can commit to more robust data-sharing approaches, including establishing data clean rooms for higher sensitivity data, engaging in academic collaborations that allow for on-platform experimentation (for example, the Facebook and Instagram 2020 project), and committing to not taking legal action against researchers who engage in web scraping for public interest research.

If TikTok takes these measures seriously, they not only may ameliorate concerns from policymakers in the U.S. and abroad but also could establish a voluntary transparency model that supports more rigorous research, better policymaking, and healthier platforms.