How does text moderation work?
Our Text Moderation solution is entirely automated. There are no humans involved, meaning that no human moderators will be used to view and rate your content. This is important to achieve super fast turn-around times, high scalability and perfect privacy.
With Text Moderation, you simply submit any type of text (message, comment, description, review, etc.) to our API. The API instantly responds with the moderation details. Any objectionable content found will be flagged and described to help you block, modify or review it.
Approaches to text moderation
Sightengine offers two different approaches to Text Moderation:
- Text Classification models based on deep learning that are great to interpret full sentences and understand linguistic subtleties, and therefore moderate text based on semantic and in-context meaning. The available classes returned by these models are the following: sexual, discriminatory, insulting, violent, toxic. One given text may obtain scores between 0 and 1 returned by the API indicating a match for multiple classes. See the Text Classification documentation to learn more.
- Rule-based pattern matching algorithms that are great to flag specific words or phrases, even when these are heavily obfuscated. The existing moderation categories for Rule-based Detection are the following: profanity (sexual, insulting, discriminatory or other inappropriate words), personal details (email addresses or phone numbers for instance), links, misleading usernames, extremist references, weapon names, medical or recreational drugs. You can also create custom lists to force our API to detect any words or content you feel should be flagged. See the Rule-based Detection documentation to know more.
Other frequent questions
- How do text classification models work?
- How do ruled-based pattern-matching algorithms work?
- How does Sightengine's text moderation differ from keyword filtering? How do you prevent users from circumventing word filters?
- Is it possible to customize the Text Moderation API to my needs?
- What languages does text moderation work with?
- Can you automatically detect the language of the text I want to moderate?
- What is the maximum text length?
- How does pricing for text moderation work?