- Home  /  
- Impact  /  
- Policy  /  
- 6 principles for independent research in a digital world
6 principles for independent research in a digital world
In this essay, we have laid out a set of six core principles that we believe can help ensure that the enormous power of modern technology can be harnessed for the good of society in addition to the good of the firms that are profiting from developing that technology.
Credit: Adobe Stock
Authors
- Filippo Lancieri,
- Christian Peukert,
- Joshua A. Tucker
Area of Study
Tags
This article was originally published in Brookings.
Introduction
Digital platforms have tremendous economic and societal power. Initially, digitization promised to make information freely available to all, create transparent markets, connect people, and democratize communication. The last decade or so saw rising awareness of privacy concerns, the spread of information of varying quality, increased polarization amidst a public health crisis, and worldwide abuses of dominant market positions, which impacted our public ethos and democracy in ways that we are yet to fully comprehend (Jiravuttipong, 2026, Goldfarb and Tucker 2012, Garcia 2017, Tucker et al. 2018, Coyle 2019, Finkel et al. 2020; Parker et al. 2020, Lancieri and Sakowski 2021, Farronato et al. 2023, Budak et al. 2024, Persily and Tucker 2020, Ruggeri et al. 2024, Pape and Rossi forthcoming, Van Angeren et al. forthcoming).
Growing concerns about the potential harms associated with an increasingly digital economy create a need for external access to internal firm data so that policymakers and the broader public can understand and regulate these markets effectively. However, data access is often restricted by strategic interests and legal barriers. This represents a fundamental challenge to empirical research that studies digital markets in general—and digital platforms in particular—to map their behaviors and the corresponding societal effects.
While sometimes interests in promoting research are aligned between firms and researchers (e.g., if firms stand to benefit from learning the findings of research that they might not have the resources or ability to carry out themselves) and firms now share more data than a few decades ago, data is often controlled by those with the least interest in disclosure—large digital platforms. This can lead to selective data sharing that could allow these parties to influence the direction of research. As a result, those outside of these organizations may be oblivious to important problems associated with increased digitization or receive a skewed understanding of the issues being studied, undermining the ability of academic research to inform the public and public policy (Barrios et al. 2025). It is not surprising that platforms interested in keeping potentially harmful practices from scrutiny might not voluntarily provide their data to external researchers, but such data can sometimes be accessed via other means, such as scraping of publicly available webpages. Research using such data can be highly socially valuable and entirely legal to access. Yet, researchers can be chilled from such research because of the costliness of threatened litigation.
Legal barriers, including the fear of lawsuits, significantly hinder researchers. Various laws govern, enable, or restrict data access and sharing. For example, contract law (e.g., platforms’ terms of service), intellectual property rights (e.g., copyright and neighboring rights), and even criminal law can be put forward as a justification for prohibiting automated data collection. The same obstacles are in the way of effectively sharing data between academic researchers, which is essential to ensure reproducible science.1 Further, privacy laws can restrict data accessibility. In all of the above, international differences in the relevant law further complicate the legal landscape for researchers.
Journals and editors are often risk-averse, which impacts the acceptance of papers involving contentious data sources. This caution stems from concerns about legal liabilities, ethical considerations, and the potential backlash from publishing research based on disputed or sensitive data. As a result, groundbreaking studies that rely on unconventional or controversial data sources may struggle to find a platform for dissemination, hindering the advancement of knowledge in certain areas. Journal practices that allow research only if the firms being studied have authorized the use of certain types of firm-related data can make the prevention of research by intimidation much more potent. This creates a perverse change in research direction away from socially useful work that might interrogate harmful effects of, say, dominant platforms and toward consulting reports that platform sponsors approve.
Certain data deserves to be protected by privacy law, contract law, and intellectual property law. However, as discussed in detail in Persily and Tucker (2020), there are significant differences between private entities that analyze data to support for-profit businesses, often with no obligation to share their findings publicly and sometimes even bound by duties to shareholders to keep this information private, and other societal actors whose aim is to analyze data to advance scientific knowledge and share their discoveries publicly or develop tools for nonprofit social good. Additionally, there are substantial economic, political, and social benefits that can result from the public sharing of insights gained from analyzing data from online platforms. Conversely, there are risks associated with formulating public policy without the insights derived from rigorous and independent data analysis.
This creates an urgent need to promote access to the data needed for independent academic research on digital markets. At the conference “Mapping and Governing the Online World,” held in Ascona, Switzerland, an international and interdisciplinary group of researchers from economics, law, computer and political sciences discussed the trade-offs and challenges associated with data access for empirical research on the digital economy and developed six core principles to guide data access in an increasingly digitalized world
In the rest of this article, we seek to accomplish three goals. First, we lay out these six principles clearly and concisely in the hope that this will facilitate recognition of their importance by multiple stakeholders. Second, we suggest concrete legal reforms that would help realize the implementation of these principles. Finally, we provide a roadmap for what sort of steps the external researcher/academic community should take to ensure that research using public digital data is carried out in a responsible manner. To be very clear, we are not in any way suggesting that either the reforms or suggestions for the research community are novel nor that they are not being pursued elsewhere—see especially the herculean work of the European Digital Media Observatory as well as centers such as the Knight-Georgetown Institute2 (among many others) —but rather to centralize the different components of this discussion in one place in a concise and summarized manner.
Six principles for empirical research of digital markets and online platforms
Public information should be available for research: Researchers rely on access to public information to conduct studies that inform policy and enhance societal understanding. The definition of what qualifies as public information will vary across contexts and time, but societies should protect those working with such data.
Independent research is needed to monitor and evaluate firm operations: Independent monitoring is crucial to detecting and addressing potential abuses, biases, or harmful practices within digital platforms. Evaluating firm operations transparently helps maintain accountability and trust among users and stakeholders.
Firms must facilitate the sharing of research-relevant data with external researchers: Data sharing between firms and researchers increases transparency and accountability, enables a deeper understanding of digital ecosystems, and can foster innovation. Data sharing and collaboration on equal footing can lead to the development of tools and insights that benefit society, such as improved algorithms for detecting illegal content.
Firms must not be able to block or vet research findings: Corporate blocking or vetting of academic research undermines the scientific process and prevents the dissemination of critical findings that could inform public debate and policy.3 Ensuring that findings are published, regardless of corporate interests, supports the integrity and progress of scientific inquiry.
Institutions, in particular universities and journals, need to protect good-faith research and researchers from legal risks associated with their work: Universities and journals must establish robust legal support systems to defend researchers against potential lawsuits and other legal challenges. By doing so, they create a safe environment for academic freedom and the pursuit of knowledge without fear of retribution.
The academic community must safeguard research ethics and integrity: The community must develop clear, enforceable guidelines covering many areas: from the protection of the data and subjects involved in the research to ensuring the information collected is archivable and reproducible, allowing longitudinal studies that can track changes and trends over time (among others). To preserve integrity, journals/editors should not accept articles when firms and other third parties have the power to engage in pre-publication review or to exercise editorial control unless such arrangements are explicitly disclosed and explained. Researchers and journals must also properly and holistically disclose potential conflicts of interest.
These principles, individually or in combination, apply broadly to a variety of data sources, including web-scraped data, data gathered through collaborations with companies, data obtained from the government through freedom of information laws, data obtained from companies through mandatory data sharing laws, or primary data collected through other means (e.g. sensors, apps, etc.). With these principles in mind, we can now turn to the matter of what practical steps can be taken to ensure their realization.
How governments and regulators can help better promote and safeguard independent research in digital markets
Clearer guidelines and legal frameworks are necessary to support research activities. Access to data is a key issue for both researchers and firms in the digital economy. The redistribution of third-party content has sparked debates and legal battles since the early days of the internet.4 With generative AI, data access concerns have resurfaced, as commercial AI developers often use scraped data to train models, operating in the same legal gray areas as independent researchers. This has led to stricter terms of use and more walled gardens, lawsuits, and calls to modernize copyright and neighboring rights to support innovation (Peukert and Windisch 2025).
The regulatory debate around commercial data access (including the modernization of copyright and other laws) is important and necessary, but it needs to be broadened to consider society’s overall interests in promoting independent academic investigations. In particular, when we change data access rules for, say, AI developers to enable AI innovation, we also need to consider changing and/or clarifying data access rules for researchers to increase transparency and accountability of technology and the firms that develop and commercialize it.
However, piecemeal changes such as modernizing copyright alone cannot solve the issue of data access for the public good. New legislation around the world should require large platforms to share data with vetted researchers under strict privacy protections, enhancing transparency and improving accountability. Further, determining what constitutes public and private information is vital. Information on public websites should be treated as public, aligning with the public interest in archiving and analyzing the information. Of course, the extent to which this data should be accessible for research needs careful consideration. While private information should remain confidential by default, companies have abused general “privacy-protection” claims to prevent data sharing and avoid public scrutiny, even when the social benefits of analyzing such data are clear (Van Loo, 2022; Richards, 2022; Lancieri, 2019).
There are ways to develop different levels of controlled access to sensitive data to balance out public interests in data sharing with private interests in privacy protection. For example, the European Digital Media Observatory (2022) published an extensive report on how researchers can engage with personal data in a way that complies with European data privacy laws, which are the strictest in the world (see also Vogus 2023). Indeed, regulatory frameworks in other societally important domains demonstrate the possibility of achieving a balance between the necessity for data transparency in achieving greater societal goals and the imperative to protect individual privacy and sensitive business information.
For example, in the U.S. and elsewhere, researchers hoping to engage with non-anonymized Census or equivalent data (e.g., French CASD data) must sometimes undergo background checks, receive specific training, gain approval from universities’ Institutional Review Boards, be approved by the Census Bureau itself, and sign a data handling agreement. Some types of data can only be accessed in secure rooms, and the Census conducts reviews to ensure that confidential data is not inadvertently disclosed. That said, there is only limited vetting of the research content—it must have statistical merit and not pose a threat to the mission and reputation of the Census. The focus is more on methods than on outcomes—as it should be. Ausloos et al. (2020) discuss the example of access to Finnish health data, where national law provides specific safeguards for researchers as long as they meet similar requirements and can demonstrate that a similar scientific research question could not be adequately answered through the exclusive use of aggregate data.
Similar approaches should be implemented for research that requires access to online platform data, where data sharing for scientific research purposes must also navigate the delicate balance between public interest and privacy or business protection. For the digital economy, the European Union, with its Digital Markets Act (DMA) and Digital Services Act (DSA), is breaking new ground in this direction (see Edelson et al. 2023). For example, policymakers can compel data sharing for questions with high societal relevance, as exemplified in Article 40 of the DSA. However, at a global scale, we need to go further. EU regulation applies only to one jurisdiction and only to some firms (designated as “gatekeepers” or “very large online platforms”). Broader access is needed.
More specifically, governments around the world should consider making at least three important changes to laws and regulations:
Pass laws that guarantee independent researcher access to internal digital platform data: The European DSA offers a promising starting point since it establishes a formal framework for researcher access. Yet, its scope remains limited: It applies only to the very largest platforms and restricts access to investigations tied to systemic risks within the EU. The older, bipartisan Platform Accountability and Transparency and Act (PATA)—which was just reintroduced in the U.S. Senate on January 7, 2026—proposed researcher access contingent on National Science Foundation approval and empowered the Federal Trade Commission to require public disclosure of selected datasets. These efforts, taken together, show that both Europe and the United States have already developed workable models. Policymakers should use them as building blocks for more comprehensive legislation with stronger mandates.
Protect researchers who independently collect public platform data: Laws should explicitly safeguard external researchers who obtain public data through methods such as web scraping or automated collection tools. Existing frameworks point in the right direction but remain fragmented. The European DSA provides some protection for certain forms of research access, yet it does not fully cover independent data collection, nor does it resolve how researchers should navigate overlapping legal regimes. Although exceptions like fair use in the United States and research exceptions in European copyright and neighboring rights law exist, their scope is often uncertain, especially when research activities intersect with contract terms, privacy rules, or cybersecurity restrictions. Clear statutory guidance is needed to ensure that good-faith research does not expose scholars to legal risk and that independent data collection can complement formal access channels.
Legal protection for academics against strategic lawsuits: Researchers who study digital markets can face litigation tactics aimed at discouraging scrutiny. One important tool to address this problem is anti-SLAPP legislation, where SLAPP stands for strategic lawsuits against public participation.5 Most U.S. states have adopted such rules,6 and Congress has considered federal versions, but protection from these rules in general—and the coverage of academics in particular—remains uneven. The European Union does not yet offer a direct analogue, although some member states have begun debating protections for journalists, civil society groups, and potentially academics.7 Comparable safeguards should extend to academic researchers, since the risk of costly or prolonged legal action can chill independent inquiry. Legislation that enables courts to dismiss weak or strategic claims at an early stage would help ensure that good-faith, well-documented research can proceed without undue pressure.
Having identified concrete steps that can be taken by governments and regulators globally to ensure access to platform data for external academic researchers, we turn next to what the academic community can do to ensure that this data is used in a responsible manner, balancing the potential benefits of better informed public policy against the potential threats to the integrity of that research and the privacy of platform users.
How the academic community can better promote and safeguard independent research in digital markets
Data sharing between firms and researchers has great potential to lead to insights that are valuable for both society and firms. However, often societal goals are not aligned with firms’ goals (Barrios et al. 2025). While changes to the law are crucial to enable data access for research and protect academics from direct and indirect pressure, the academic community can and must also do its part to ensure the independence and integrity of research. In particular, to address the power imbalance between small academic teams and the large corporations behind online platforms, the academic community must adopt collaborative solutions and establish best practices for rigorous empirical research on these platforms. Some areas we see potential in this regard include:
Developing ethical standards for work with public data: The academic community should set ethical standards for navigating legal uncertainties surrounding public data (e.g., obtained through web scraping) and private/personal data (for example, codes of ethics and codes of necessary practices for research involving digital data).8 The academic community must seize the opportunity to generate knowledge that policymakers and other stakeholders need to hold private actors accountable, mitigate societal harm, and achieve greater societal goals.
Updating journals’ policies to facilitate the publication of studies that used independently collected public data: Academic journals should immediately update their policies to permit the publication of studies using web-scraped public data, despite legal uncertainties. Beyond correcting methodological errors, peer review inherently involves making qualitative judgments about contributions to scientific advancement. Why not also take responsibility for assessing whether the societal benefits of revealing a new fact outweigh the strategic interests of those who prefer that fact to remain hidden?
Updating journals’ policies to require the publication of the data agreement between researchers and data providers: While access to data is an increasingly essential input and currency for academic papers, journals’ policies with regard to data access remain outdated. For example, while many journals require the disclosure of whether a firm had the right to review the paper before publication, they do not require the authors to disclose exactly under what conditions academics obtained access to the data. Journals should require that academics share the underlying terms of access to databases, with the potential exception of the price paid for access when the database was acquired from a third party. This policy will also have the upside of increasing academics’ bargaining power with data providers, as they can point out that the terms of the agreement will ultimately be disclosed to the public.
Better recognizing the value of shared data as a key contribution to academia: Having access to certain data is increasingly pivotal for an academic career, so researchers are sometimes skeptical about sharing their own data under a general fear that others can then engage in similar research. To align incentives, the academic community should treat data sharing as a substantive contribution and reflect this in hiring, promotion, and funding decisions. One concrete step is for journals to create or expand dedicated article formats that focus on describing and documenting datasets. Some fields already publish such data papers in leading outlets, providing clear credit to the scholars who generate and curate the underlying resources (e.g. Strategic Management Journal, Marketing Science). Wider adoption of this model would help make data sharing incentive-compatible from an individual career perspective and support a more open and cumulative research ecosystem.
Updating journals’ policies to require better disclosure of potential conflicts of interest: Journals should ensure that their disclosure policies are comprehensive and up to date, requiring disclosure of all relationships that could lead to third-party influence on research (Barrios et al. 2025). This includes, as mentioned above, the conditions under which academics obtained access to private datasets.
Creating legal defense funds to protect exposed researchers: Finally, universities, foundations, and professional associations should consider creating legal defense funds and leveraging support from law school clinics to assist researchers facing legal challenges.9 Consortia of journals should retain lawyers, and perhaps a funder could sponsor a substantial retainer for legal disputes. Such initiatives can counterbalance the chilling effects of current legal uncertainties regarding the use of independently collected public data or other forms of data access.
Conclusion
Data access for external researchers is crucial for ensuring that society understands the impact of the rapidly evolving digital landscape and its new tech titans. It is also crucial for informing a wide range of public policy from issues such as children’s health to market competitiveness to safeguarding the information environment. In this essay, we have laid out a set of six core principles that we believe can help ensure that the enormous power of modern technology can be harnessed for the good of society in addition to the good of the firms that are profiting from developing that technology. We have also identified concrete steps that can be taken by policymakers and the academic research community to help bring about this more optimistic vision of a future where the impact of digital platforms on society is more transparent than it is today. With the fast growth of new AI platforms such as OpenAI and Anthropic, as well as the moves of existing tech giants such as Google, Meta, and Amazon into the world of AI, the need for such action could not be more urgent.