State media control impacts the output of U.S.-based LLMs

May 13, 2026 · Commentary

Training data for LLMs does not just fall from the sky, our research finds.

Information into a black box then churning outputs.

Ask an AI model the same political question in two different languages, and you may get two very different responses. Our team’s new research, published today in Nature, explains part of why this is the case.

The training data that powers today’s large language models (LLMs) like ChatGPT or Claude does not simply appear out of thin air. Instead, it is produced in the context of social and political institutions. These institutions shape the information environment, which in turn shape the training data that actually exists in the world.

Put another way, when governments seek to control the information environment of their citizens, they may also inadvertently end up shaping the training data for LLMs in that country’s own language. For example, when a country prohibits local media from publishing information critical of the government, the media environment and thus the training data may not be as critical of the regime. And, as it turns out, this has consequences for the outputs of LLMs.

Read The Full Article

More Latest News

State media control impacts the output of U.S.-based LLMs

Authors

Area of Study

Tags