Short of Suspension: How Suspension Warnings Can Reduce Hate Speech on Twitter

There has been ongoing debate about the effectiveness of suspending or banning abusive social media users. But we know little about whether warning a user they might be suspended can reduce hate speech. In this experiment, we find that warning users of the potential consequences of their behavior can temporarily reduce their hateful language.

Abstract

Debates around the effectiveness of high-profile Twitter account suspensions and similar bans on abusive users across social media platforms abound. Yet we know little about the effectiveness of warning a user about the possibility of suspending their account as opposed to outright suspensions in reducing hate speech. With a pre-registered experiment, we provide causal evidence that a warning message can reduce the use of hateful language on Twitter, at least in the short term. We design our messages based on the literature on deterrence, and test versions that emphasize the legitimacy of the sender, the credibility of the message, and the costliness of being suspended. We find that the act of warning a user of the potential consequences of their behavior can significantly reduce their hateful language for one week. We also find that warning messages that aim to appear legitimate in the eyes of the target user seem to be the most effective. In light of these findings, we consider the policy implications of platforms adopting a more aggressive approach to warning users that their accounts may be suspended as a tool for reducing hateful speech online.

Background

On January 6, 2021, Twitter suspended Donald Trump’s account following tweets that the platform believed would incite further violence following the Capitol insurrection. This event, among many, demonstrates how social media platforms can be used to increase polarization and mobilize inter-group conflict. To curb this behavior, social media platforms routinely suspend or ban users. This measure, however, can have unforeseen consequences, such as encouraging banned users to move to more radicalized platforms where their language is acceptable. Generally, there is still an emerging literature on the effects that these types of interventions online (aimed at reducing hate speech) have on users and the curbing of hate speech use. Ultimately, we want to examine whether there are measures, other than banning and suspending accounts, social media platforms can take that will discourage hateful behavior online.

Study

To answer this question, we design and run an experiment on Twitter, where we test whether warning users of their potential suspension might reduce their use of online hate speech. First, we identify users who are candidates for suspensions (based on their past behavior) and download their follower lists. Once the user is suspended, we randomly assign a portion of their followers, specifically those who engaged in hateful language online, to receive a warning that they too, may be suspended for a similar reason. Our tweets aim to deter users from using hateful language by relying on three mechanisms: costliness, legitimacy, and credibility. This allows us to manipulate the degree to which users believe their suspension would be costly, legitimate, and credible.

Results

We find that the act of sending a warning message to a user can significantly decrease their use of hateful language — measured by their ratio of hateful tweets over their total number of tweets. One warning tweet, sent by an account with no more than 100 followers, can decrease the ratio of tweets with hateful language by up to 10 percent. However, we don’t find strong evidence that distinguishes between warnings that are high, versus low, in legitimacy, credibility, or costliness. The high legitimacy messages seem to be the most effective out of the messages we tested. Specifically, users are motivated to curb their behavior when they realize someone they follow was banned for a similar reason — often, users don’t realize who is banned or suspended unless they are told. Our findings suggest that one option for reducing hate speech on Twitter would be to warn users who have reason to suspect they might be at risk of suspension for use of hateful language.