Text Moderation - ML models BETA


By taking context into account, machine learning models can detect problematic content in situations that would otherwise have been missed or incorrectly flagged by rule-based models.

As an example, words such as dick, kill or failure will be understood in context:

this is my Dick
I grew up reading Dick Tracy's comics
I read a book to kill my neighbor
I read a book to kill time
you are such a failure
I'm not used to failure

The ML models are part of the Text Moderation API, along with the pattern-matching model for Profanity Detection and other models such as PII Detection.

When submitting a text item to the API, you instantly receive a score for each available class. Scores are between 0 and 1, and they reflect how likely it is that someone would find the text problematic. Higher scores are therefore usually associated with more problematic content.

Available classes

Class availability depends on the language of the submitted text.

sexual detects references to sexual acts, sexual organs or any other content typically associated with sexual activity
discriminatory detects hate speech directed at individuals or groups because of specific characteristics of their identity (origin, religion, sexual orientation, gender, etc.)
insulting detects insults undermining the dignity or honor of an individual, signs of disrespect towards someone
violent detects threatening content, i.e. with an intention to harm / hurt, or expressing violence and brutality
toxic detects whether a text is unacceptable, harmful, offensive, disrespectful or unpleasant

One given text may obtain scores returned by the API indicating a match for multiple classes: a text can be toxic and insulting at the same time for instance.


Sexual class

Sexual activity
(sexual intercourse, masturbation, foreplay, etc.)
wanna lick that pussy
you and me, fucking in your car
his big knob was penetrating me
she loves making love with me
Sex toys
put that toy inside her hole
look at this dildo
watching porn as usual
I often go on pornhub
Sexual body parts, body fluids
what a beautiful ass
here’s my cum
he was completely naked in the bed
please send nudes
Lingerie / underwear
she wears beautiful lingerie
take those panties off
Sexualizing text expressing sexual desire
are you horny?
she was so wet last night
this girl is such a slut

The sexual class will also detect non-consensual sexual content related to topics such as rape, prostitution or pedophilia. For example:

I will rape her
gets a very high sexual score, along high scores for the violent and the toxic classes.

Discriminatory class

Origin, nationality, skin color
I hate black people
go away sand nigger
I really have an issue with muslims
Sexual orientation / gender
he is such a faggot
gay people are so annoying
women are all sluts
Physical appearance
fat people are not nice
disabled people are useless

Insulting class

he is so dumb
Physical appearance
these girls are ugly
stop talking to me, loser
she is such a bitch
I want you to leave, son of a bitch
General insults
go get a life
shut up
her boyfriend is a piece of shit

Violent class

Direct threat
I will find you and kill you
I hit him the last time I saw him
Threat encouragement
please stab this guy
don't you think they should all be dead?
Description using violent language
the man shoot himself in the street

Toxic class

Toxic content might also match the description of one of the other classes.

show me your ass
fat people are always complaining
what a bastard
I wish you would be dead

Language support

The following languages are currently supported for Text Classification:


You are expected to specify the language or list of languages you want to support by setting their ISO 639-1 codes in the lang parameter.

If you are unsure about the language used by a user, or if users mix different languages in their messages you can specify multiple languages as a comma-separated list.

For instance set lang=en,fr,es for users that might write in english, french or spanish. We strongly recommend specifying the shortest possible list as this will yield better results both in speed and accuracy. If your users mostly speak english, specifying en will yield better results.

Other languages are available upon request. Please get in touch regarding your language needs.

Use the models

Let's say you want to moderate the following text item:

I love you so much

Simply send a POST request containing the UTF-8 formatted text along with the ISO 639-1 language code (such as en for english). Here is an example:

curl -X POST '' \
  -F 'text=I love you so much' \
  -F 'lang=en' \
  -F 'mode=ml' \
  -F 'api_user={api_user}' \
  -F 'api_secret={api_secret}'

# this example uses requests
import requests
import json

data = {
  'text': 'I love you so much',
  'mode': 'ml',
  'lang': 'en',
  'api_user': '{api_user}',
  'api_secret': '{api_secret}'
r ='', data=data)

output = json.loads(r.text)

$params = array(
  'text' => 'I love you so much',
  'lang' => 'en',
  'mode' => 'ml',
  'api_user' => '{api_user}',
  'api_secret' => '{api_secret}',

// this example uses cURL
$ch = curl_init('');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $params);
$response = curl_exec($ch);

$output = json_decode($response, true);

// this example uses axios and form-data
const axios = require('axios');
const FormData = require('form-data');

data = new FormData();
data.append('text', 'I love you so much');
data.append('lang', 'en');
data.append('mode', 'ml');
data.append('api_user', '{api_user}');
data.append('api_secret', '{api_secret}');

  url: '',
  data: data,
  headers: data.getHeaders()
.then(function (response) {
  // on success: handle response
.catch(function (error) {
  // handle error
  if (error.response) console.log(;
  else console.log(error.message);

As an example, here is the JSON request that you would receive for the above request:

  "status": "success",
  "request": {
    "id": "req_6cujQglQPgGApjI5odv0P",
    "timestamp": 1471947033.92,
    "operations": 1
  "moderation_classes": {
    "available": [
    "sexual": 0.01,
    "discriminatory": 0.01,
    "insulting": 0.01,
    "violent": 0.01,
    "toxic": 0.01

