Models / Duplicate Detection

Duplicate and Near-Duplicate Detection


The Near-Duplicate Detection model is used to identify images that are so called duplicates or near-duplicates. It can be used across the following use-cases:

  • Image Blacklists and Disallow lists: Blacklist images and prevent them from (re)appearing on your site or app. For instance copyrighted images, illegal images, previously removed images.
  • Spam & theft prevention: Detect spammy and unwanted behaviors. Prevent users from submitting the same image multiple times, and from submitting other users' photos.

What is detected as duplicates

The duplicate detection works with all types of images, both natural (such as standard photos) and artificial: drawings, logos, abstract content... It will detect duplicates of an image across a wide range of transformations and modifications, many of which are typically used to try to evade or circumvent duplicate detection. Here are examples:

Resolution, size and format changes

Changes to the dimensions of the image, to the resolution, to the image format and encoding (such as PNG, JPG, WEBP...)

Overlays, watermarks, text, logos

The deduplication is robust to the addition overlays such embedded text items, watermarks, logos etc

Cropping and reframing

The deduplication model is crop-resistant, meaning that images that have been cropped, or have an added border/frame will be detected

Color changes and filters

Color modifications such as transforming the image to black-and-white, sepia, changing the saturation, brightness, contrast, hue and other color manipulations


The deduplication model will detect a duplicate even if the image has been rotated

Stretched and Horizontal flips

Compressing, stretching (changing the aspect ratio), or flipping the image


Other types of image edits where specific parts of the image are modified, enlarged, transformed or have their colors changed

Was this page helpful?