How do text classification models work?
Machine learning models can detect problematic content in situations that would otherwise have been missed or incorrectly flagged by Rule-based models because they are able to take context into account.
When submitting a text item to the API, you instantly receive a score for each available class. Scores are between 0 and 1, and they reflect how likely it is that someone would find the text problematic. Higher scores are therefore usually associated with more problematic content. Note that the API may return multiple high scores for one text if the text is matching multiple classes.
Class availability depends on the language of the submitted text. The available classes for Text Classification are the following:
| Class | Description |
| sexual | detects references to sexual acts, sexual organs or any other content typically associated with sexual activity |
| discriminatory | detects hate speech directed at individuals or groups because of specific characteristics of their identity (origin, religion, sexual orientation, gender, etc.) |
| insulting | detects insults undermining the dignity or honor of an individual, signs of disrespect towards someone |
| violent | detects threatening content, i.e. with an intention to harm / hurt, or expressing violence and brutality |
| toxic | detects whether a text is unacceptable, harmful, offensive, disrespectful or unpleasant |
See the Text Classification documentation to learn more.
Other frequent questions
- How does text moderation work?
- How do ruled-based pattern-matching algorithms work?
- How does Sightengine's text moderation differ from keyword filtering? How do you prevent users from circumventing word filters?
- Is it possible to customize the Text Moderation API to my needs?
- What languages does text moderation work with?
- Can you automatically detect the language of the text I want to moderate?
- What is the maximum text length?
- How does pricing for text moderation work?