Text Detection

Last updated December 8th, 2017

Table of contents

Overview

The text detection API can help you determine if an image contains natural text or artificial text.

Image with natural text and artificial text

Principles

The text detection does not use any image meta-data to determine if text is present in an image. The file extension, the meta-data or the name will not influence the result. The classification is made using only the pixel content of the image.

The model works for a host of different alphabets: latin, chinese, japanese, korean, arabic, hebrew, hindi...

Latin
Hebrew
Arabic

Use-cases

  • Require that users submit or upload images without artificial text
  • Hide ads that have been artificially added
  • Filter images containing personal information such as phone numbers, email adresses or usernames
  • Detect watermarks

Limitations

  • Texts smaller than 5% of the width or height of the image are not detected.

Natural text

The returned value is between 0 and 1, images with a natural text value closer to 1 will contain natural text while images with a natural text value closer to 0 will not contain natural text.

Natural text value 0.99562

Artificial text

The returned value is between 0 and 1, images with an artificial text value closer to 1 will contain artificial text while images with an artificial text value closer to 0 will not contain artificial text.

Artificial text value 0.98049

Boxes

The values returned (x1, x2, y1, y2) help locate the texts present in the image.

For each box there you get a label that indicates the type of text (text-natural or text-artificial)

Use the model

To start, you need to create an account to retrieve your API keys. Then you must install the SDK that corresponds to your programming language.


# install cURL: https://curl.haxx.se/download.html


pip install sightengine


composer require sightengine/client-php


npm install sightengine --save

Detect text in an image

Let's say you want moderate the following image:

You can either upload a public URL to the image, or upload the raw binary image. Here's how to upload the


curl -X GET -G 'https://api.sightengine.com/1.0/check.json' \
    -d 'models=text' \
    -d 'api_user={api_user}&api_secret={api_secret}' \
    -d 'url=https://d3m9459r9kwism.cloudfront.net/img/examples/example2.jpg'


 # if you haven't already, install the SDK with 'pip install sightengine'
from sightengine.client import SightengineClient
client = SightengineClient('{api_user}','{api_secret}')
output = client.check('text').set_url('https://d3m9459r9kwism.cloudfront.net/img/examples/example2.jpg')


// if you haven't already, install the SDK with 'composer require sightengine/client-php'
use \Sightengine\SightengineClient;
$client = new SightengineClient('{api_user}','{api_secret}');
$client->check(['text'])->set_url('https://d3m9459r9kwism.cloudfront.net/img/examples/example2.jpg');


// if you haven't already, install the SDK with 'npm install sightengine --save'
var sightengine = require('sightengine')('{api_user}','{api_secret}');
sightengine.check(['text']).set_url('https://d3m9459r9kwism.cloudfront.net/img/examples/example2.jpg').then(function(result) {
    // The result of the API
}).catch(function(err) {
    // Error
});

Here is the result:

                    
                    
{
    "status": "success",
    "request": {
        "id": "req_22Qd0gUNmRH4GCYLvYtN6",
        "timestamp": 1512483673.1405,
        "operations": 1
    },
    "text": {
        "has_artificial": 0.99932,
        "has_natural": 0.19986,
        "boxes": [
            {
                "x1": 0.18466,
                "y1": 0.01757,
                "x2": 0.8555,
                "y2": 0.244,
                "label": "text-artificial"
            }
        ]
    },
    "media": {
        "id": "med_22Qdfb5s97w8EDuY7Yfjp",
        "uri": "https://sightengine.com/assets/img/examples/text1-1200.jpg"
    }
}
                    
                

Did you find this page helpful?

We're always looking for advice to help improve our documentation!

Let us know what you think