Introduction to Classic Deepfake Detection Models

Fri, 06 Mar 2026 21:35:00 +0800

Introduction

With the rapid development of Generative Adversarial Networks (GANs) and Diffusion Models, Deepfake content is becoming increasingly prevalent on the internet. Effectively identifying these forged images and videos has become a critical issue in multimedia forensics and information security.

Core Content

This article introduces some milestone works in the field of Deepfake detection:

1. Spatial Feature-based Models

MesoNet: Focuses on macroscopic burial artifacts in compressed facial images.
Xception-based (FaceForensics++): A representative of transfer learning, fine-tuning models pre-trained on large-scale datasets.

2. Temporal Feature-based Models

Deepfake Stacked RNN: Utilizes the continuity between video frames to capture forgery traces.

3. Frequency Domain-based Models

F3-Net: Identifies forgeries through frequency domain decomposition and frequency statistics.

4. Biological Feature-based Models

Lip-sync Check: Observing whether lip movements are synchronized with speech.
Blink Detection: Early Deepfake models often struggled to generate natural blinking.

Reflections

References

Attention Is All You Need — Deep Dive into the Transformer Architecture

Fri, 06 Mar 2026 20:00:00 +0800

📄 Background

Before Transformers, sequence-to-sequence tasks relied heavily on RNN/LSTM architectures, which suffered from two major bottlenecks:

Sequential computation prevents parallelization
Long-range dependency degradation over long sequences

Vaswani et al. proposed the Transformer at NeurIPS 2017, relying entirely on attention mechanisms to model global dependencies — no recurrence, no convolution.

🔑 Core Mechanisms

Self-Attention

For input $X \in \mathbb{R}^{n \times d}$, we compute Query, Key, Value projections:

$$Q = XW^Q, \quad K = XW^K, \quad V = XW^V$$

Deep Learning on Yang