AI images: testing human abilities to identify fakes

September 3rd, 2024

Want to become a pro at spotting AI-generated images? You think you’ve got what it takes? Ready, set, go!

TL;DR

2,500 humans went through our “AI or not?” test
We find that humans achieved an accuracy of 71% (since the dataset is balanced, this corresponds to an F1-score of 71%)
We also find that 20% of images fool most humans in this dataset, meaning that more than half of humans mislabelled them
Players have more doubts when it comes to real images identification: only 23% of them are correctly recognized by 80% or more people while it is 45% for AI images
We find some interesting patterns in images that fooled most humans, especially related to image texture, framing or likeliness

Why a game

AI-generated images are increasingly frequent online and help foster creativity. At the same time they can be used for misinformation, impersonation, fake news and other fraudulent activities. It is particularly true in the current electoral context where new deepfakes are appearing daily on social networks and are sometimes hard to identify as a human. Some are even shared by presidential candidates themselves: yes the image is quite unlikely, but the quality can often be so impressive it makes you want to believe it.

Sightengine has been working with Trust & Safety for 11 years. We develop automated solutions to detect content that is AI-generated or manipulated. We also want to help users become aware of how tricky some images can be, especially for humans, when they look real to the naked eye.

That’s why we decided to build a game to challenge our website’s visitors and test their GenAI detection skills. We’re excited to share the results in this blogpost!

Game mechanics

The “AI or not?” game is very simple.

Click on the link and you’ll see the first image. Is it GenAI or real? We’ll let you find out. Just have a look at the image, pay attention to the details, and also to the big picture. You’ve got all the time you want to decide but no cheating is allowed, you should only use your human abilities. Choose between AI or Not AI and see your score go up to 1/1, or stay at 0…

Try this for 25 images and you’ll get an accuracy score. If you like the game, you can continue to run 50, 75, 100 or more images on the test. Try to improve before seeing your final stats and comparing them to others!

AI or not game Game interface

The images from the game are a mix of real license-free photographs and images generated with AI models such as Stable Diffusion or MidJourney. These were generated with a prompt, i.e. an input using natural language that is given to a GenAI model so it can then generate an output, intentionally designed to mislead you. The sample you'll see typically contains a wide variety of images, from cityscapes to more natural locations, from portraits to representations of objects, and even to more abstract or artistic images.

Please note that we did not do any post-processing on the generated images. No additional filters, no edits, those are the raw outputs.

How well did people do

High-level results

So, how sharp were those eyes? We analyzed the figures and the results are in! It turns out that spotting AI-generated images is becoming increasingly difficult. While most users expect to be right more than 90% of times, in reality they are not.

Here are some stats: - The results presented here are based on 150 images, half of them AI and half of them real, rated by more than 2,500 players since March. - The average accuracy across players is 71%. - On average humans classified 68% of real images correctly, and 73% of AI images correctly

Overall, 90% of images received more than 50% of correct votes, meaning that many of you could successfully identify GenAI. But the real story lies in the details: while some images were almost too easy to classify, some were very divisive among players and others had even the most confident guessers racking their brains.

AI images that fooled (almost) everyone

Believe it or not, the following images are entirely made up of fake pixels.

The hardest image in the dataset was the following image of a woman lying in the grass. The face is nearly flawless, the grass and dress look great, the image has an odd aspect ratio (it's not square) and the image is grainy as the prompt specifically asked for a grainy rendering.

AI image of an woman lying in grass. Only 28% of test takers correctly labelled this image.

The second hardest image in the dataset did not include a human face. The luminosity and framing of this image are not perfect. The image is a bit dark. There is a bit of backlighting, which is typically something that happens when taking pictures as an amateur. The church is centered in the image and even though the trees give a rather artistic effect to the overall scene, they are not properly framed. Finally, the thin black band to the right suggests that this image could be a silver print.

All of these elements give an impression of authenticity, a notion that is rarely associated with GenAI.

village church in winter AI image of a church in a snowy scene. Only 30% of test takers got it right.

AI images that were easy to spot

Now that we've seen the hard examples, let's move to the easy ones. There were 20 AI images that got more than 90% correct votes.

The easiest AI images are often those with imperfect human faces. Even if they have improved and continue to do so, GenAI models are known to be struggle with human bodies, limbs and faces. When it comes to faces, the skin is often too perfect, the face seems completely symetrical, with no flaws. They even look like they've been pulled straight out of an animated movie.

See the kid below for instance, isn't he too cute to be true? Even the mud stains seem particularly well arranged on his face, when we all know that toddlers playing in the dirt don't exactly end up looking like that...

kid with mud stains on his face AI image showing a kid whose skin is flawless, a bit too smooth to be true. 92% of test takers got it right.

There are also other parameters that make some of the images easy to identify as AI, such as the likeliness of the scene or the background in the image. Because of their world knowledge, humans generally have no difficulty to identify what is not plausible. It is particularly true when the historical context, the time period and / or the location shown in the image do not fit together.

Have a look at the image example below. What would these gladiators do in the middle of NYC? They don't match the era or the place! But if you're still not convinced—it's true that movie shoots are common in New York—check the background. The 'Warriors" sign seems a bit odd, on the road, just behind a bus that also seems out of place.

gladiators in New York City AI image showing gladiators in NYC that don't match the era or the place, with an odd background. 97% of test takers correctly labelled this image as AI.

Real images with the fewest hesitations

Real images are generally more difficult to identify. While overall correct votes are pretty high for AI images, only 23% of real images are recognized as such by at least 80% of players.

Those with the fewest hesitations are those that could have been captured by anyone, but certainly not a professional photographer! And by that we mean photos that are poorly framed, or with lighting issues or where the subject is not very clear or interesting for instance.

Players found the real image below easy too recognize, probably because of a mix of all three parameters. It is of no obvious interest, or at least the interest is not visible in the frame, and it is kind of dark.

tourists taking pictures in a bus Real image showing tourists in a bus, with a very dark lighting, poor framing and uninteresting target

Real images that were mistaken as AI images by many

Then, there are the most difficult cases. The ones that were missed by almost all players. The ones that require super-robotic skills.

It may be surprising, but the bench image is absolutely real. Humans can take incredible pictures too! And these great pictures are often wrongly rated as AI-generated. But image-generating models don't always produce improbable or magnificent images. In fact, it's sometimes enough to indicate in the prompt that you want an old-fashioned or naturalistic image style to obtain a result more similar to what a non professional human could do.

bench under autumn leaves Real image showing a park bench in autumn, taken in a very professional shoot

And images with split votes

Finally, there are images that caused a fair amount of uncertainty and disagreement among the players (about half 'AI' and half 'real' votes)… These do account for a bit more than 20% of the total number of images rated, which is no mean feat!

And you, how would you rate the examples below? Why don’t you try your luck and skills to win the game?

Image showing a woman playing golf

Image showing a traditional Hindu wedding ritual

Image showing a woman reading on a train

Takeaways and what's next

Human perception helps us detect what seems a little too good to be true, or a little too unusual. While humans correctly achieve 71% accuracy, which is better than random guesses (50% accuracy), they consistently perform worse than users typically expect. We would also like to stress that we did not post-process AI images to correct their flaws. Doing so would have led to even lower accuracies. In addition to this, the rapid advancement in AI technology suggests that this task will only get tougher. AI models are getting better at replicating potential real-life scenes, and distinguishing between real and generated content is becoming increasingly challenging, even for trained eyes.

This is where solutions such as watermarking and automated detection come into play. We think that AI itself can help ensure we know which is which by automatically detecting AI-generated or manipulated content. This solution is a way:

For platforms to protect their users from seeing misleading and / or potentially harmful content
For individuals to check whether the content they see is actually real

Have a look to learn more.

Towards a fine-grained and contextual approach to Nudity Moderation

Blanket bans on nudity and specifically on bare breasts are coming under increased scrutiny. We hear they clash with cultural expectations and impede right to expression for women, trans and nonbinary people.

How to spot AI images and deepfakes: tips and limits

Learn how to recognize AI-Generated images and deepfakes. Improve your detection skills to stay ahead of the game.

Products

How well can humans spot AI images? A fascinating test of our abilities to identify fakes