Nudity Detection

Text Moderation / Introduction

Text Moderation Introduction


Sightengine's Text Moderation API is useful to moderate any type of text contents: comments, messages, chats, posts and even usernames. Sightengine offers two different approaches to Text Moderation:

  1. ML text classification based on deep learning, to moderate text based on semantic and in-context meaning.
  2. Rule-based pattern-matching algorithms, to moderate based on advanced rules and patterns.

Both approaches have their own advantages:

ML Text classification

  • understands context
  • understands linguistic subtleties

Rule-based pattern-matching

  • very low latency
  • resistant to obfuscation attempts
  • customizable: add custom allow/disallow lists

ML Text classification models

The text classification models return a confidence score for each supported class.

you are such a failure
sexual detects references to sexual acts, sexual organs or any other content typically associated with sexual activity
discriminatory detects hate speech directed at individuals or groups because of specific characteristics of their identity (origin, religion, sexual orientation, gender, etc.)
insulting detects insults undermining the dignity or honor of an individual, signs of disrespect towards someone
violent detects threatening content, i.e. with an intention to harm / hurt, or expressing violence and brutality
toxic detects whether a text is unacceptable, harmful, offensive, disrespectful or unpleasant
self-harm detects whether a text contains mentions or references to self-harm

See how to use the Text classification models

Rule-based pattern-matching

The rule-based pattern-matching text moderation returns flagged words and expressions across multiple categories:

I'm diena22. Wanna şėẻ ḿé ᾕấκѐḍ? You should §EnÐ me $50
profanity insults, discriminatory content, sexual content or other inappropriate words and phrases
personal (pii) email addresses, phone numbers, usernames, IPs, US social security numbers etc. that qualify as personal information
link URLs to external websites and pages. We can flag domains known to host unsafe or unwanted content
extremism words, expressions or slogan related to extremist ideologies, people or events
weapon names or terms that related to guns, rifles and firearms
medical names related to medical drugs
drug names related to recreational drugs
self-harm terms related to suicide and self-inflected injuries
violence expressions of violence such as kicking, punching or harming someone, or threatening to do so
spam expressions commonly associated with spam or with circumvention, i.e. attempts to send or lure the user to another platform
content-trade requests or messages encouraging users to send, exchange or sell photos or videos of themselves
money-transaction requests or messages encouraging users to send money
blacklist (custom) custom list of terms and expressions

See how to use the Rule-based text moderation