Text Moderation / Principles

Text Moderation Principles


Sightengine's Text Moderation API is useful to moderate any type of text contents: comments, messages, chats, posts and even usernames.

Sightengine offers two different approaches to Text Moderation:

  1. Text classification based on deep learning, to moderate text based on semantic and in-context meaning.
  2. Rule-based pattern-matching algorithms, to flag specific words or phrases.

The text classification models are great to interpret full sentences and understand linguistic subtleties. The pattern-matching algorithms are great to detect specific words or phrases, and to work even in presence of heavy obfuscation by users trying to circumvent basic filters. Users can also add their own words and expressions to the pattern-matching algorithms.

Text classification Models

The text classification models are multi-label models. They return multiple severity values to allow customers to make granular moderation decisions. The model returns one severity value per class. The following classes are available:

sexual detects references to sexual acts, sexual organs or any other content typically associated with sexual activity
discriminatory detects hate speech directed at individuals or groups because of specific characteristics of their identity (origin, religion, sexual orientation, gender, etc.)
insulting detects insults undermining the dignity or honor of an individual, signs of disrespect towards someone
violent detects threatening content, i.e. with an intention to harm / hurt, or expressing violence and brutality
toxic detects whether a text is unacceptable, harmful, offensive, disrespectful or unpleasant
See how to use the Text classification models

Rule-based pattern matching

The categories of content that are detected are:

You can also implement a custom whitelist to force our API to disregard any words or content you feel shouldn't be flagged.