Now, there are ways to find out. New data shows that machine learning can identify content created by online political influence operations.
Area of Study
This article was originally published at The Washington Post.
Late last week, Twitter removed more than 1,600 accounts associated with influence efforts linked to Cuba, Russia, Saudi Arabia and Thailand. Earlier in the week, the Department of Homeland Security’s latest Homeland Threat Assessment found that “Russian influence actors will continue using overt and covert methods to aggravate social and racial tensions, undermine trust in U.S. authorities, stoke political resentment, and criticize politicians who Moscow views as anti-Russia.”
This was nothing new. Since early August — when U.S. intelligence officials warned that Russia was working via social media to boost President Trump’s chances in the 2020 presidential race — we’ve seen a steady stream of evidence about foreign interference in the election.
And the threat involves domestic influencers, as well. Last week, Facebook announced it had banned an Arizona-based marketing firm that was using fake social media accounts in its work for Turning Point Action, a conservative advocacy group.
Influence campaigns have long been infiltrating social media
Twitter’s first data release of 3,841 Russian-backed accounts had content dating back to 2014. In June of this year, Twitter disclosed 32,242 accounts linked to long-running influence operations by China, Russia and Turkey.
And those efforts are just the tip of the iceberg. From 2013 to 2019, at least 30 different countries were targeted in 96 different online political influence campaigns. In Honduras, a country not known for launching sophisticated disinformation campaigns, Twitter removed 3,104 accounts associated with the Honduran president.
Despite these efforts to curtail this type of disinformation, we still do not know the extent to which foreign powers are trying to shape political discussion, what topics they are working on and how their efforts change over time.
That’s now changing. Content created by industrialized online political influence operations is very different from what typical social media users post. Machine learning tools can help identify those differences — and theoretically make it possible to produce the social media equivalent of a weather report, something that tells citizens who is trying to influence them, and in what direction.
We can use existing campaigns to find new ones
So how do we know this? In a recently published article in Science Advances, we demonstrated that it is possible to identify foreign influence campaigns on Twitter if you know what prior influence campaigns looked like. The idea here is that we can use prior campaigns to train machines to identify features of these campaigns that humans would not be able to identify across millions of posts, but which machine learning — sometimes also referred to as artificial intelligence — can learn to detect.
We set out to prove that this is the case, and built a database of millions of posts from 2015 to 2018 that Twitter and Reddit report were part of Chinese, Russian and Venezuelan influence campaigns. We combined that database with tens of millions of posts in the same period by politically engaged, average Americans.
We then measured the characteristics of each piece of content, including factors such as timing, word count, emotional tone or links to a news website, as well as how a given post related to what everyone else was posting.
Using these data, we trained a machine-learning algorithm to replicate the human intuition for when a post looks out of place. For example, both online trolls and normal social media users often include news URLs in their posts. But because the trolls are trying to draw attention in a new direction, their posts mention different people alongside local news than you’d expect to see in normal user’s post. In 2016, for example, they might have called out Hillary Clinton instead of the local mayor.
We tested how this approach would have performed historically (much like the approach Wall Street firms use to manage the risk from trading models) at five prediction tasks (e.g., find new activity by a given campaign next month based on its activity this month) on four influence campaigns (e.g., Russian activity on Reddit in 2016 and 2017). After completing 463 experiments in all, we found that this approach consistently distinguished posts that were part of troll campaigns from those by normal users.
Yes, trolls are (sort of) predictable
We think the reason this worked stems from underlying social processes. To have real influence, coordinated operations need to either say something new, or they need to say a lot of something that users are already saying. It’s easy to find the first: New content is readily identifiable.
But when foreign influence campaigns try to shift a conversation with large amounts of content, they rely on relatively low-skilled workers producing a lot of posts. These workers are often not citizens of the target country, and therefore need to be trained on what “normal” looks like. Moreover, their managers probably assess a worker’s performance against some standardized set of performance metrics, just like in any other organization. Both processes lead to consistency.
Exactly what distinguished influence campaign content was different for different campaigns, and it changed over time. Russian posts shared unusual URLs, mentioned people in different ways than normal users, chose odd hashtags and combined mentions and URLs in slightly strange ways (e.g., mentioning national politicians in posts with links to local news sites). Venezuelan trolls, on the other hand, only retweeted certain people and topics, which is not how normal users behave.
There’s no single solution, but there is a path forward
Unfortunately, this means there is no single model for finding foreign influence campaigns. Social media usage is dynamic. Normal users are always responding to current events and trolls are continually adapting and trying new tactics.
While we did not find a stable set of characteristics that allow us to detect all campaigns, we did find a method for detecting these campaigns based on the fact that troll content is almost always different in detectable ways. And machine learning allows us to find those differences at scale. Other scholars have developed promising techniques, as well.
The day when we can have a “daily report” of online influence campaigns to inform citizens may not be as far away as it would seem.
Meysam Alizadeh is a data science fellow at the Shorenstein Center on Media, Politics and Public Policy at Harvard Kennedy School, Harvard University.
Cody L.J. Buntain is an assistant professor in informatics, director of the Information Ecosystems Lab at New Jersey Institute of Technology’s Ying Wu College of Computing, and a faculty research associate at the NYU Center for Social Media and Politics.
Jacob N. Shapiro is a professor of politics and international affairs and director of the Empirical Studies of Conflict Project at Princeton University.