Models / URL and Link Moderation

URL and Link Moderation

Overview

URL and Link Moderation can be used to detect and filter links wherever they appear:

In Texts, messages and reviews, through the Text Moderation API
Embedded as text in Images or Videos, through the Text in Image/Video Moderation API
Embedded as QR codes in Images or Videos, through the QR code Moderation API

boxes showing a qr code found on a sleeve with a link to a harmful website

Image containing a flagged QR code linking to a harmful website

Features

URL moderation works out-of-the-box, and will automatically flag links and URLs from more than 5 million different domain names, updated weekly.

The API detects a broad spectrum of unwanted or unsafe websites, across many categories. It also provides you with the following capabilities:

Smart handling of redirects and URL shorteners
Detection of deceptive/spoofing techniques such as punycode attacks
Detection of obfuscated URLs and links
Works across all your content: texts, images, videos (embedded links, qr codes...)

The link moderation model does not perform a real-time analysis on the content of the target website. The link moderation relies on our internal databases of known domains and pages, updated weekly, to categorize links.

Detection categories

The API returns all links that have been detected. When applicable, the API also returns the category to which the link or URL belongs. Here is the list of categories that Sightengine will flag for you:

Category	Description
unsafe	sites presenting a risk for visitors, such as phishing, malware, scams
adult	sites containing porn, erotica, escort services
gambling	legal and illegal casinos, money games
drugs	sites promoting or selling recreational drugs
hate	extremist or hateful content
custom	your own custom disallow lists and allow lists

All categories apart from the custom one work directly out-of-the-box. The categories cover links and URLs from more than 5 million domains known to host unwanted content. Our lists are updated weekly to reflect the ever changing nature of the web.

Moderate URLs in Text Messages

Here is how you can detect and moderate URLs in text messages:


curl -X POST 'https://api.sightengine.com/1.0/text/check.json' \
  -F 'text=Come check this page: http://harmfulsiteexample.com' \
  -F 'lang=en' \
  -F 'mode=rules' \
  -F 'api_user={api_user}' \
  -F 'api_secret={api_secret}'


# this example uses requests
import requests
import json

data = {
  'text': 'Come check this page: http://harmfulsiteexample.com',
  'mode': 'rules',
  'lang': 'en',
  'api_user': '{api_user}',
  'api_secret': '{api_secret}'
}
r = requests.post('https://api.sightengine.com/1.0/text/check.json', data=data)

output = json.loads(r.text)


$params = array(
  'text' => 'Come check this page: http://harmfulsiteexample.com',
  'lang' => 'en',
  'mode' => 'rules',
  'api_user' => '{api_user}',
  'api_secret' => '{api_secret}',
);

// this example uses cURL
$ch = curl_init('https://api.sightengine.com/1.0/text/check.json');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $params);
$response = curl_exec($ch);
curl_close($ch);

$output = json_decode($response, true);


// this example uses axios and form-data
const axios = require('axios');
const FormData = require('form-data');

data = new FormData();
data.append('text', 'Come check this page: http://harmfulsiteexample.com');
data.append('lang', 'en');
data.append('mode', 'rules');
data.append('api_user', '{api_user}');
data.append('api_secret', '{api_secret}');

axios({
  url: 'https://api.sightengine.com/1.0/text/check.json',
  method:'post',
  data: data,
  headers: data.getHeaders()
})
.then(function (response) {
  // on success: handle response
  console.log(response.data);
})
.catch(function (error) {
  // handle error
  if (error.response) console.log(error.response.data);
  else console.log(error.message);
});

See request parameter description

Parameter	Type	Description
text	string	UTF-8 encoded text to moderate
mode	string	comma-separated list of modes. Modes are rules for the rule-based model or ml for ML models
categories	string	comma-separated list of categories to check. Possible values: profanity, personal, link, drug, weapon, violence, self-harm, medical, extremism, spam, content-trade, money-transaction (optional)
lang	string	comma-separated list of target languages
opt_countries	string	comma-separated list of target countries for phone number detection (optional)
list	string	id of a custom list to be used for rule-based moderation (optional)
api_user	string	your API user id
api_secret	string	your API secret

The JSON response contains a description of URLs that have been detected under the link key. The response also detects other elements such as profanity, personal information and grawlix. Check the text moderation guide to learn more about those capabilities.


{
  "status": "success",
  "request": {
    "id": "req_6cujQglQPgGApjI5odv0P",
    "timestamp": 1471947033.92,
    "operations": 1
  },
  "profanity": {
    "matches": []
  },
  "personal": {
    "matches": []
  },
  "link": {
    "matches": [
      {
        "type": "url",
        "category": "unsafe",
        "match": "http://harmfulsiteexample.com"
      }
    ]
  },
}

Moderate URLs in Images/Videos

Here is how you can detect and moderate URLs in images or videos:


curl -X GET -G 'https://api.sightengine.com/1.0/check.json' \
    -d 'models=text-content,qr-content' \
    -d 'api_user={api_user}&api_secret={api_secret}' \
    --data-urlencode 'url=https://sightengine.com/assets/img/examples/example-qr-600.jpg'


# this example uses requests
import requests
import json

params = {
  'url': 'https://sightengine.com/assets/img/examples/example-qr-600.jpg',
  'models': 'text-content,qr-content',
  'api_user': '{api_user}',
  'api_secret': '{api_secret}'
}
r = requests.get('https://api.sightengine.com/1.0/check.json', params=params)

output = json.loads(r.text)


$params = array(
  'url' =>  'https://sightengine.com/assets/img/examples/example-qr-600.jpg',
  'models' => 'text-content,qr-content',
  'api_user' => '{api_user}',
  'api_secret' => '{api_secret}',
);

// this example uses cURL
$ch = curl_init('https://api.sightengine.com/1.0/check.json?'.http_build_query($params));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
curl_close($ch);

$output = json_decode($response, true);


// this example uses axios
const axios = require('axios');

axios.get('https://api.sightengine.com/1.0/check.json', {
  params: {
    'url': 'https://sightengine.com/assets/img/examples/example-qr-600.jpg',
    'models': 'text-content,qr-content',
    'api_user': '{api_user}',
    'api_secret': '{api_secret}',
  }
})
.then(function (response) {
  // on success: handle response
  console.log(response.data);
})
.catch(function (error) {
  // handle error
  if (error.response) console.log(error.response.data);
  else console.log(error.message);
});

The JSON response contains a description of URLs that have been detected either as text under the text key, or as QR codes under the qr key.


{
  "status": "success",
  "request": {
    "id": "req_6cujQglQPgGApjI5odv0P",
    "timestamp": 1471947033.92,
    "operations": 2
  },
  "text": {
    "personal": [],
    "link": [],
    "social": [],
    "profanity": [],
  },
  "qr": {
    "personal": [],
    "link": [
      "type": "url",
      "category": "unsafe",
      "match": "http://harmfulsiteexample.com"
    ],
    "social": [],
    "profanity": [],
  }
}

Any other needs?

See our full list of Image/Video models for details on other filters and checks you can run on your images and videos. You might also want to check our Text models to moderate text-based content: messages, reviews, comments, usernames...

Was this page helpful?

Products

MODERATION

REDACTION

REFERENCE

URL and Link Moderation

Table of contents

Overview

Features

Detection categories

Moderate URLs in Text Messages

Moderate URLs in Images/Videos

Any other needs?