Deepfakes and AI-Generated Content — What Detection Actually Tells You in 2026
Spotting Manipulation
Deepfake detection tools exist, but their limitations are as important as their capabilities. This topic explains how deepfakes are created, what detection can and cannot establish, and the broader question of how to maintain trust in visual evidence in an era of increasingly capable generative AI — including the 'liar's dividend' risk that deepfakes make it easier to deny real footage.
Learning Material
4 pagesHow Deepfakes Are Made: The Technology Without the Code
The term 'deepfake' — a portmanteau of 'deep learning' and 'fake' — refers to synthetic audio, video, or image content generated using machine learning systems, typically to make a real person appear to say or do something they never said or did. The word originated on an online forum in 2017 when early face-swap videos generated with neural networks began circulating, initially in harmful non-consensual contexts.
Understanding the underlying technology at a conceptual level — without requiring any programming knowledge — helps clarify both what deepfakes can and cannot do, and why detection is a technically difficult problem.
Generative Adversarial Networks (GANs)
The first generation of sophisticated deepfake video was produced primarily using Generative Adversarial Networks, or GANs. A GAN consists of two neural networks trained simultaneously and in opposition: a generator, which tries to create fake content convincing enough to pass as real, and a discriminator, which tries to distinguish real from generated content. Through thousands of iterations, each network improves in response to the other — the generator producing more convincing output as the discriminator gets better at spotting flaws.
The result, after sufficient training, is a generator capable of producing synthetic faces, voices, or video frames that are statistically similar to the training data — often indistinguishable to casual inspection.
Diffusion models
By 2022–2023, diffusion models had largely superseded GANs for image generation and were increasingly used for video. A diffusion model learns to reverse the process of gradually adding random noise to an image: given an image corrupted with noise, the model learns to predict and remove that noise, step by step, until a coherent image emerges from what started as a random field. This approach produces images of exceptional photorealism and is the basis of widely used tools including DALL-E, Stable Diffusion, and Midjourney.
For video deepfakes, the dominant technique as of 2026 remains GAN-based face-swapping (replacing one person's face in real footage with another's) and diffusion-based full video generation, which is computationally more intensive.
Voice deepfakes
Audio deepfakes — synthetic voice cloning — use similar approaches applied to speech. Modern voice cloning systems can reproduce an individual's voice characteristics from as little as a few seconds of audio sample with high fidelity. They are widely considered among the most practically dangerous form of synthetic media because they are cheaper to produce than video and harder to verify by ear.
ENISA's 2025 Threat Landscape report flagged voice cloning as a growing vector in fraud operations (see the full report for current figures).1
Footnotes#
-
The ENISA (European Union Agency for Cybersecurity) Threat Landscape report is published annually and covers AI-related threats including synthetic media. The 2025 report, covering July 2024 to June 2025, is available at enisa.europa.eu. ↩