Models / Text / Personal Information Detection

Personal Information Detection

Overview

The Personal Information Detection model detects PII (personally identifiable information) in any user-generated text: comments, messages, posts, reviews etc.

Principles

The Personal Information Detection model is useful to detect the following instances of personal information:

  • Email addresses
  • US and Canada phone numbers
  • UK phone numbers
  • French phone numbers
  • IP addresses (both IPv4 and IPv6)
  • US social security numbers (SSN)

Please contact us if you need us to detect any other type of personal information.

Whenever personal information is detected, the offending string along with its position in the submitted text will be returned to you, so you can take action.

Detection strength

The Personal Information Detection model is a lot stronger than basic REGEX-based approaches.

Email addresses tend to be obfuscated by users to try to evade filters. Examples are phonetic replacements such as rick(at)gmail(dot)com, removing and altering some parts of the email address: rick_gmail

Phone numbers can be written in many different ways, depending on countries. +1 800 000 0000 800-000-0000 +1(800)000-0000 along with thousands of other variations.

Use the model

Simply send the UTF-8 formatted text along with the ISO 639-1 language code (such as en for english). Here is an example:


curl -X POST 'https://api.sightengine.com/1.0/text/check.json' \
  -F 'text=I am rick(at)gmail(dot)com or 1(800)343-3598' \
  -F 'lang=en' \
  -F 'mode=standard' \
  -F 'api_user={api_user}' \
  -F 'api_secret={api_secret}'


# this example uses requests
import requests
import json

data = {
  'text': 'I am rick(at)gmail(dot)com or 1(800)343-3598',
  'mode': 'standard',
  'lang': 'en',
  'api_user': '{api_user}',
  'api_secret': '{api_secret}'
}
r = requests.post('https://api.sightengine.com/1.0/text/check.json', data=data)

output = json.loads(r.text)


$params = array(
  'text' => 'I am rick(at)gmail(dot)com or 1(800)343-3598',
  'lang' => 'en',
  'mode' => 'standard',
  'api_user' => '{api_user}',
  'api_secret' => '{api_secret}',
);

// this example uses cURL
$ch = curl_init('https://api.sightengine.com/1.0/text/check.json');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $params);
$response = curl_exec($ch);
curl_close($ch);

$output = json_decode($response, true);


// this example uses axios and form-data
const axios = require('axios');
const FormData = require('form-data');

data = new FormData();
data.append('text', 'I am rick(at)gmail(dot)com or 1(800)343-3598');
data.append('lang', 'en');
data.append('mode', 'standard');
data.append('api_user', '{api_user}');
data.append('api_secret', '{api_secret}');

axios({
  url: 'https://api.sightengine.com/1.0/text/check.json',
  method:'post',
  data: data,
  headers: data.getHeaders()
})
.then(function (response) {
  // on success: handle response
  console.log(response.data);
})
.catch(function (error) {
  // handle error
  if (error.response) console.log(error.response.data);
  else console.log(error.message);
});

The JSON response contains a description of all personal information found along with the positions within the text string.


{
  "status": "success",
  "request": {
    "id": "req_6cujQglQPgGApjI5odv0P",
    "timestamp": 1471947033.92,
    "operations": 1
  },
  "profanity": {
    "matches": []
  },
  "personal": {
    "matches": [
      {
        "type": "email",
        "match": "rick(at)gmail(dot)com",
        "start": 5,
        "end": 25
      },
      {
        "type": "phone_number_us",
        "match": "1(800)343-3598",
        "start": 30,
        "end": 43
      }
    ]
  },
  "link": {
    "matches": []
  },
}

Any other needs?

See our full list of models for details on other filters and checks you can run on your images and videos.

Did you find this page helpful?

We're always looking for advice to help improve our documentation!

Let us know what you think

Cookies help us deliver our services. By using our services, you agree to our use of cookies. Learn more

OK