Self-harm and mental-health related content encompasses a large variety of situations:
Self-harm related content ranges from the most extreme, such as someone expressing their immediate intention to commit suicide, to more subtle ones such as expressions of sadness or sorrow, or dissatisfaction with one's own body image.
Suicide is the 12th leading cause of death in the US and the second one among people aged 10-34 (Curtin et al., 2022). Depression affects 3.8% of the world's population, and is a leading cause of self-injury and self-harm.
Mental Health Services Received in the Past Year: Among Adults Aged 18 or Older with Any Mental Illness in the Past Year; 2008-2020 (Substance Abuse and Mental Health Services Administration, 2021)As a result, self-harm related content is prevalent and frequently encountered, especially among younger users. In 2021, Instagram took action on ~17 millions of suicide and self-injury content.
"In recent years, there has been increasing acknowledgement of the important role mental health plays in achieving global development goals."
World Health Organization
Mental health is an increasingly important topic, especially among younger users. Its importance is reflected on social media, in apps and in virtual worlds as users increasingly express, share and discuss their issues online.
In some countries like the UK, suicide promotion is illegal. Self-harm promotion might soon become illegal too. As such, messages promoting suicide, anorexia or other potentially dangerous behaviors should be moderated by platforms.
Should you consider any related content to be promotion? Should people in distress be censored or punished? Applications and platforms may choose to moderate and take actions on content expressing sorrow, grief, dissatisfaction, but there is no specific legal obligation to do so.
However, platforms hosting user-generated content have an ethical responsibility to act:
Beyond the legal and ethical needs to act, cases of promoted or even simply described suicide or self-harm will harm the experience and reputation of the platform or application.
Detecting and removing self-harm-related content is a very difficult task, for multiple reasons:
That said, apps and platforms typically resort to four levels of detection measures:
Platforms should consider the great help that other users can provide: they can report alarming messages to the trust and safety team so that these are reviewed by moderators and handled by the platform. It's a simple way to do text moderation, the user just needs a way to do the report. The main disadvantage is the delay needed to properly handle a potentially large volume of reports.
To simplify the handling and prioritization of user reports, it is important that users have the option to categorize their reports, with categories such as self-harm or suicide.
Human moderators are useful for detecting potential self harm or suicide, even if according to Twitter, "judging behavior based on online posts alone is challenging". This means that moderators (and even users) should be provided with clear guidelines: what are the warning signs, what constitutes suicidal or dangerous thoughts, etc. For instance, someone describing self-harm or suicide without identifying himself as suicidal should probably be treated differently as a person who identifies as suicidal or has already attempted suicide or frequently posts messages about self-harm, depression or suicide.
In addition to the challenges inherent to human moderation, such as slower speed and lower consistency, there are a few challenges specific to self-harm that need to be taken into account:
Keywords can be used to pre-filter comments and messages by detecting words related to mental health, such as:
Keywords such as those will necessarily flag many false positives, meaning messages that contain those words without being problematic. But they are still helpful as human moderators can then focus on a shorter list of messages to be verified, leading to better reactivity.
Of course, there is a risk of missing cases by using only keywords since many worrying messages do not contain any problematic keywords:
The use of ML models may complement the keywords and human moderators approach by detecting such cases with the help of annotated datasets containing self-harm or suicide-related content.
Automated moderation can also be used to moderate images and videos submitted by users, and by detecting depictions of self-harm, self-injury, violence or weapons.
Some solutions provide help to platforms and applications hosting user-generated content. For example, kokocares is a free initiative whose goal is to make mental health accessible to everyone. They created a Suicide Prevention Kit that provides keywords to detect risky situations in real time, support resources for users and dashboards and statistics to measure outcomes.
In any case, detected suspicious messages should be appropriately treated. When detecting risky situations, platforms should do a triage to handle each of them according to their priority.
Comments that suggest an upcoming act or great violence like I'm going to cut myself tonight or I'm so depressed I want to die are considered high-risk messages.
Because they are urgent, platforms should set up a clearly defined process for immediate escalation that could include steps like:
These messages are less urgent or less obvious, such as Sometimes I just feel so lonely or I guess hurting myself is not a solution. A user who mentioned once being anxious is not necessarily a person with a risk of self-harm or suicide for example, but if this same users talks every day about his anxiety problems, the risk is bigger.
Platforms could decide to take some time to better understand the context and see if the user posts more similar comments.
Setting up a moderation log to allow moderators to report the incidents encountered for a specific user is a way to track possible at-risk users.
The process defined for high-risk situations could then be used for users that are in fact at risk.
Mental health is a growing concern, both in real life and online. It can be difficult to detect risky messages related to self-harm, suicide, anxiety, depression or eating disorders, especially as context is so important. However, it is essential to be able to detect and handle these situations quickly, even at less urgent stages, as the results of early intervention have been proven to be very promising.
This is a guide to detecting, moderating and handling illegal traffic and trade in texts and images.
Results and insights from our AI or not game: how well humans identify AI images, when they get fooled and what we can learn from this.