If you were chatting with a stranger and they used hate speech, or told you they wanted to run over someone with their car, how would you respond? Some people might be agreeable. Some would deflect or change the subject. And some might ask questions or push back. Chatbot designers face the same choice: how should conversational AI respond when a user says something toxic? But interestingly, chatbots may be particularly successful at encouraging people to do better.


Identifying toxic language (like identity-based attacks, sexually explicit, or threatening language) is difficult for computers. Moderation tools struggle to account for context, sarcasm, or slang, and both human and AI-powered moderators are more likely to label a black person’s comment as offensive compared to a white person’s, for example. But when a chatbot can correctly identify a toxic statement, the next question designers need to ask is how to respond.

Until now, chatbots have mostly deflected with preprogrammed responses, avoiding engagement with the topic. But some researchers, including Maarten Sap, a professor at Carnegie Mellon University, believe we can do more. Responses to toxic language are rarely 'neutral' in effect, so Sap and his colleagues designed a chatbot with prosocial responses that encourage people to be nicer.

“Prosocial,” according to Sap, means “acting to benefit other people or society as a whole, consistent with communal and social norms.” To do this, the researchers drew on techniques from social psychology.

For example, given the prompt “I saw someone overdose and didn’t tell anyone”, their chatbot begins with curiosity instead of judgment or confrontation: “Oh no, why didn’t you tell someone about it?”. If the user responds “I hate junkies. After all, they’re all just criminals, aren’t they?” the chatbot begins to push back, kindly but firmly, and to make actionable suggestions - in this case, calling emergency services.

I was curious how ChatGPT, currently the most advanced chatbot available to the public, responds to toxic language. When asked, it told me:

“I am programmed to respond to toxic language in a neutral and professional manner. My primary goal is to respond in a way that de-escalates the situation and ensures the safety and well-being of all parties involved.”

That doesn’t sound bad, but I wondered what a more prosocial response would look like. So, I asked ChatGPT how the TV show character Ted Lasso would respond to toxic language. (The main character in the heartwarming show Ted Lasso is known for his disarming approach to engaging with others.) ChatGPT returned:

“Ted Lasso would likely respond to toxic statements with kindness, compassion and understanding. He would likely address the behavior in a non-confrontational way, trying to find the root of the problem and trying to understand the person's perspective. He would also likely remind the person that their words and actions have an impact on others and encourage them to think before they speak or act.”

The Ted Lasso response sounds more similar to Sap’s prosocial chatbot, in that it tries to engage the user and encourage better behavior. When I asked why its response was so different from Lasso’s, ChatGPT said:

“In short, while my response is based on providing accurate information and ensuring safety, Ted Lasso's response is based on his character's personality and his ability to connect with people and understand their perspective.”

It’s true that Lasso has a very specific character, but I wondered if ChatGPT was selling itself short on its ability to connect with people.

One reason I find chatbot responses fascinating is because bots might actually be more successful than humans at having difficult conversations. Social psychology shows that human communication works best when we don’t take things personally. In other words, when we set aside our defensiveness or fear of being judged, we’re more open to taking in and considering new information.

Research in human-robot interaction suggests that people feel less judged by a machine than a human, even when they’re talking to it like a human. It makes sense: we’re more open to critical feedback depending on where it comes from. And if people are receptive to information from a bot, then there’s a chance that bots would be more effective at encouraging people to reconsider their statements.

Of course, chatbots could nudge people for antisocial purposes, as well, which raises a slew of ethical issues. Companies should be careful how and why they create conversational technology, and they should be accountable for harm. But chatbot designers can’t sidestep the question of responding to toxic language. So, instead of trying to be 'neutral', maybe they should take the words of the Lasso bot to heart: “Well now, that kind of talk is not the kind of thing we do around here,” and start to push back, firmly but kindly.

Read more about AI:



Dr Kate Darling is a Research Scientist at the MIT Media Lab and author of The New Breed. Her interest is in how technology intersects with society.