FAQ / General Moderation

What do the confidence scores returned by the API mean? How can I use the scores to decide on which action to take?

What are these scores?

Images and Videos

For Image and Video Moderation, a score between 0 and 1 is returned for each moderation category. This value reflects the model’s confidence. A score closer to 1 indicates that the image has a higher probability of containing what the model is looking for.

Text

For Text Moderation, scores are returned only when using the Text Classification models. The API returns a score between 0 and 1 for each available class, reflecting how likely it is that someone would find the text problematic.

What should I do with them?

With these scores, you can define thresholds for each moderation class to help you make a decision about the content you submitted, depending on how you want to moderate your content. Thresholds should be adapted to your specific use and will depend on how tolerant you are to false negatives or false positives.

You can create your own moderation rules internally using the scores and your thresholds.

For Image Moderation, we provide some features to determine what actions (accept, reject...) should be taken based on the image content. It could simply be to automatically reject the image using Fully Automated Moderation if the score is above a predefined value.

It could also be to review some of the submitted images using Hybrid Moderation if the score is between two predefined thresholds.

Was this page helpful?