Introduction to Classic Deepfake Detection Models

An introduction to classic deep learning models and their evolution in the field of Deepfake detection.

Introduction

With the rapid development of Generative Adversarial Networks (GANs) and Diffusion Models, Deepfake content is becoming increasingly prevalent on the internet. Effectively identifying these forged images and videos has become a critical issue in multimedia forensics and information security.

Core Content

This article introduces some milestone works in the field of Deepfake detection:

1. Spatial Feature-based Models

  • MesoNet: Focuses on macroscopic burial artifacts in compressed facial images.
  • Xception-based (FaceForensics++): A representative of transfer learning, fine-tuning models pre-trained on large-scale datasets.

2. Temporal Feature-based Models

  • Deepfake Stacked RNN: Utilizes the continuity between video frames to capture forgery traces.

3. Frequency Domain-based Models

  • F3-Net: Identifies forgeries through frequency domain decomposition and frequency statistics.

4. Biological Feature-based Models

  • Lip-sync Check: Observing whether lip movements are synchronized with speech.
  • Blink Detection: Early Deepfake models often struggled to generate natural blinking.

Reflections

References