Text Moderation / Introduction

Text Moderation Introduction

Introduction

Sightengine's Text Moderation API is useful to moderate any type of text contents: comments, messages, chats, posts and even usernames. Sightengine offers two different approaches to Text Moderation:

ML text classification based on deep learning, to moderate text based on semantic and in-context meaning.
Rule-based pattern-matching algorithms, to moderate based on advanced rules and patterns.

Both approaches have their own advantages:

ML Text classification

understands context
understands linguistic subtleties

Rule-based pattern-matching

very low latency
resistant to obfuscation attempts
customizable: add custom allow/disallow lists

ML Text classification models

The text classification models return a confidence score for each supported class.

you are such a failure

Class	Description
sexual	detects references to sexual acts, sexual organs or any other content typically associated with sexual activity
discriminatory	detects hate speech directed at individuals or groups because of specific characteristics of their identity (origin, religion, sexual orientation, gender, etc.)
insulting	detects insults undermining the dignity or honor of an individual, signs of disrespect towards someone
violent	detects threatening content, i.e. with an intention to harm / hurt, or expressing violence and brutality
toxic	detects whether a text is unacceptable, harmful, offensive, disrespectful or unpleasant
self-harm	detects whether a text contains mentions or references to self-harm

See how to use the Text classification models

Rule-based pattern-matching

The rule-based pattern-matching text moderation returns flagged words and expressions across multiple categories:

I'm diena22. Wanna şėẻ ḿé ᾕấκѐḍ? You should §EnÐ me $50

Class	Description
profanity	insults, discriminatory content, sexual content or other inappropriate words and phrases
personal (pii)	email addresses, phone numbers, usernames, IPs, US social security numbers etc. that qualify as personal information
link	URLs to external websites and pages. We can flag domains known to host unsafe or unwanted content
extremism	words, expressions or slogan related to extremist ideologies, people or events
weapon	names or terms that related to guns, rifles and firearms
medical	names related to medical drugs
drug	names related to recreational drugs
self-harm	terms related to suicide and self-inflected injuries
violence	expressions of violence such as kicking, punching or harming someone, or threatening to do so
spam	expressions commonly associated with spam or with circumvention, i.e. attempts to send or lure the user to another platform
content-trade	requests or messages encouraging users to send, exchange or sell photos or videos of themselves
money-transaction	requests or messages encouraging users to send money
blacklist (custom)	custom list of terms and expressions

See how to use the Rule-based text moderation

Next Steps

GUIDE

Rule-based Moderation

Learn how you use our rule-based text moderation.

GUIDE

Text Classification Models

Learn how to use our deep-learning based text moderation models.

Products

MODERATION

REDACTION