Machine learning models can detect problematic content in situations that would otherwise have been missed or incorrectly flagged by Rule-based models because they are able to take context into account.
When submitting a text item to the API, you instantly receive a score for each available class. Scores are between 0 and 1, and they reflect how likely it is that someone would find the text problematic. Higher scores are therefore usually associated with more problematic content. Note that the API may return multiple high scores for one text if the text is matching multiple classes.
Class availability depends on the language of the submitted text. The available classes for Text Classification are the following:
Class | Description |
sexual | detects references to sexual acts, sexual organs or any other content typically associated with sexual activity |
discriminatory | detects hate speech directed at individuals or groups because of specific characteristics of their identity (origin, religion, sexual orientation, gender, etc.) |
insulting | detects insults undermining the dignity or honor of an individual, signs of disrespect towards someone |
violent | detects threatening content, i.e. with an intention to harm / hurt, or expressing violence and brutality |
toxic | detects whether a text is unacceptable, harmful, offensive, disrespectful or unpleasant |
See the Text Classification documentation to learn more.
Was this page helpful?