Want to become a pro at spotting AI-generated images? You think you’ve got what it takes? Ready, set, go!
AI-generated images are increasingly frequent online and help foster creativity. At the same time they can be used for misinformation, impersonation, fake news and other fraudulent activities. It is particularly true in the current electoral context where new deepfakes are appearing daily on social networks and are sometimes hard to identify as a human. Some are even shared by presidential candidates themselves: yes the image is quite unlikely, but the quality can often be so impressive it makes you want to believe it.
Sightengine has been working with Trust & Safety for 11 years. We develop automated solutions to detect content that is AI-generated or manipulated. We also want to help users become aware of how tricky some images can be, especially for humans, when they look real to the naked eye.
That’s why we decided to build a game to challenge our website’s visitors and test their GenAI detection skills. We’re excited to share the results in this blogpost!
The “AI or not?” game is very simple.
Click on the link and you’ll see the first image. Is it GenAI or real? We’ll let you find out. Just have a look at the image, pay attention to the details, and also to the big picture. You’ve got all the time you want to decide but no cheating is allowed, you should only use your human abilities. Choose between AI or Not AI and see your score go up to 1/1, or stay at 0…
Try this for 25 images and you’ll get an accuracy score. If you like the game, you can continue to run 50, 75, 100 or more images on the test. Try to improve before seeing your final stats and comparing them to others!
Game interface
The images from the game are a mix of real license-free photographs and images generated with AI models such as Stable Diffusion or MidJourney. These were generated with a prompt, i.e. an input using natural language that is given to a GenAI model so it can then generate an output, intentionally designed to mislead you. The sample you'll see typically contains a wide variety of images, from cityscapes to more natural locations, from portraits to representations of objects, and even to more abstract or artistic images.
Please note that we did not do any post-processing on the generated images. No additional filters, no edits, those are the raw outputs.
So, how sharp were those eyes? We analyzed the figures and the results are in! It turns out that spotting AI-generated images is becoming increasingly difficult. While most users expect to be right more than 90% of times, in reality they are not.
Here are some stats: - The results presented here are based on 150 images, half of them AI and half of them real, rated by more than 2,500 players since March. - The average accuracy across players is 71%. - On average humans classified 68% of real images correctly, and 73% of AI images correctly
Overall, 90% of images received more than 50% of correct votes, meaning that many of you could successfully identify GenAI. But the real story lies in the details: while some images were almost too easy to classify, some were very divisive among players and others had even the most confident guessers racking their brains.
Believe it or not, the following images are entirely made up of fake pixels.
The hardest image in the dataset was the following image of a woman lying in the grass. The face is nearly flawless, the grass and dress look great, the image has an odd aspect ratio (it's not square) and the image is grainy as the prompt specifically asked for a grainy rendering.
AI image of an woman lying in grass. Only 28% of test takers correctly labelled this image.
The second hardest image in the dataset did not include a human face. The luminosity and framing of this image are not perfect. The image is a bit dark. There is a bit of backlighting, which is typically something that happens when taking pictures as an amateur. The church is centered in the image and even though the trees give a rather artistic effect to the overall scene, they are not properly framed. Finally, the thin black band to the right suggests that this image could be a silver print.
All of these elements give an impression of authenticity, a notion that is rarely associated with GenAI.
AI image of a church in a snowy scene. Only 30% of test takers got it right.
Now that we've seen the hard examples, let's move to the easy ones. There were 20 AI images that got more than 90% correct votes.
The easiest AI images are often those with imperfect human faces. Even if they have improved and continue to do so, GenAI models are known to be struggle with human bodies, limbs and faces. When it comes to faces, the skin is often too perfect, the face seems completely symetrical, with no flaws. They even look like they've been pulled straight out of an animated movie.
See the kid below for instance, isn't he too cute to be true? Even the mud stains seem particularly well arranged on his face, when we all know that toddlers playing in the dirt don't exactly end up looking like that...
AI image showing a kid whose skin is flawless, a bit too smooth to be true. 92% of test takers got it right.
There are also other parameters that make some of the images easy to identify as AI, such as the likeliness of the scene or the background in the image. Because of their world knowledge, humans generally have no difficulty to identify what is not plausible. It is particularly true when the historical context, the time period and / or the location shown in the image do not fit together.
Have a look at the image example below. What would these gladiators do in the middle of NYC? They don't match the era or the place! But if you're still not convinced—it's true that movie shoots are common in New York—check the background. The 'Warriors" sign seems a bit odd, on the road, just behind a bus that also seems out of place.
AI image showing gladiators in NYC that don't match the era or the place, with an odd background. 97% of test takers correctly labelled this image as AI.
Real images are generally more difficult to identify. While overall correct votes are pretty high for AI images, only 23% of real images are recognized as such by at least 80% of players.
Those with the fewest hesitations are those that could have been captured by anyone, but certainly not a professional photographer! And by that we mean photos that are poorly framed, or with lighting issues or where the subject is not very clear or interesting for instance.
Players found the real image below easy too recognize, probably because of a mix of all three parameters. It is of no obvious interest, or at least the interest is not visible in the frame, and it is kind of dark.
Real image showing tourists in a bus, with a very dark lighting, poor framing and uninteresting target
Then, there are the most difficult cases. The ones that were missed by almost all players. The ones that require super-robotic skills.
It may be surprising, but the bench image is absolutely real. Humans can take incredible pictures too! And these great pictures are often wrongly rated as AI-generated. But image-generating models don't always produce improbable or magnificent images. In fact, it's sometimes enough to indicate in the prompt that you want an old-fashioned or naturalistic image style to obtain a result more similar to what a non professional human could do.
Real image showing a park bench in autumn, taken in a very professional shoot
Finally, there are images that caused a fair amount of uncertainty and disagreement among the players (about half 'AI' and half 'real' votes)… These do account for a bit more than 20% of the total number of images rated, which is no mean feat!
And you, how would you rate the examples below? Why don’t you try your luck and skills to win the game?
Image showing a woman playing golf
Image showing a traditional Hindu wedding ritual
Image showing a woman reading on a train
Human perception helps us detect what seems a little too good to be true, or a little too unusual. While humans correctly achieve 71% accuracy, which is better than random guesses (50% accuracy), they consistently perform worse than users typically expect. We would also like to stress that we did not post-process AI images to correct their flaws. Doing so would have led to even lower accuracies. In addition to this, the rapid advancement in AI technology suggests that this task will only get tougher. AI models are getting better at replicating potential real-life scenes, and distinguishing between real and generated content is becoming increasingly challenging, even for trained eyes.
This is where solutions such as watermarking and automated detection come into play. We think that AI itself can help ensure we know which is which by automatically detecting AI-generated or manipulated content. This solution is a way:
Have a look to learn more.
Learn how Sightengine performed in an independent AI-media detection benchmark, outperforming competitors with advanced methodologies.
This is a guide to detecting, moderating and handling online promotion, spam and fraud in texts and images.