- Home  /  
- Impact  /  
- News & Commentary  /  
- State media control impacts the output of U.S.-based LLMs
State media control impacts the output of U.S.-based LLMs
Training data for LLMs does not just fall from the sky, our research finds.
Credit: ChatGPT
Authors
- Hannah Waight,
- Eddie Yang,
- Yin Yuan,
- Sol Messing,
- Brandon M. Stewart,
- Margaret E. Roberts,
- Joshua A. Tucker
Area of Study
Ask an AI model the same political question in two different languages, and you may get two very different responses. Our team’s new research, published today in Nature, explains part of why this is the case.
The training data that powers today’s large language models (LLMs) like ChatGPT or Claude does not simply appear out of thin air. Instead, it is produced in the context of social and political institutions. These institutions shape the information environment, which in turn shape the training data that actually exists in the world.
Put another way, when governments seek to control the information environment of their citizens, they may also inadvertently end up shaping the training data for LLMs in that country’s own language. For example, when a country prohibits local media from publishing information critical of the government, the media environment and thus the training data may not be as critical of the regime. And, as it turns out, this has consequences for the outputs of LLMs.