Research Notes on Yang

Introduction to Classic Deepfake Detection Models

Fri, 06 Mar 2026 21:35:00 +0800

Introduction

With the rapid development of Generative Adversarial Networks (GANs) and Diffusion Models, Deepfake content is becoming increasingly prevalent on the internet. Effectively identifying these forged images and videos has become a critical issue in multimedia forensics and information security.

Core Content

This article introduces some milestone works in the field of Deepfake detection:

1. Spatial Feature-based Models

MesoNet: Focuses on macroscopic burial artifacts in compressed facial images.
Xception-based (FaceForensics++): A representative of transfer learning, fine-tuning models pre-trained on large-scale datasets.

2. Temporal Feature-based Models

Deepfake Stacked RNN: Utilizes the continuity between video frames to capture forgery traces.

3. Frequency Domain-based Models

F3-Net: Identifies forgeries through frequency domain decomposition and frequency statistics.

4. Biological Feature-based Models

Lip-sync Check: Observing whether lip movements are synchronized with speech.
Blink Detection: Early Deepfake models often struggled to generate natural blinking.

Reflections

References

Attention Is All You Need — Deep Dive into the Transformer Architecture

Fri, 06 Mar 2026 20:00:00 +0800

📄 Background

Before Transformers, sequence-to-sequence tasks relied heavily on RNN/LSTM architectures, which suffered from two major bottlenecks:

Sequential computation prevents parallelization
Long-range dependency degradation over long sequences

Vaswani et al. proposed the Transformer at NeurIPS 2017, relying entirely on attention mechanisms to model global dependencies — no recurrence, no convolution.

🔑 Core Mechanisms

Self-Attention

For input $X \in \mathbb{R}^{n \times d}$, we compute Query, Key, Value projections:

$$Q = XW^Q, \quad K = XW^K, \quad V = XW^V$$

Research Insights: How to Read Papers Efficiently

Thu, 05 Mar 2026 18:00:00 +0800

The Three-Pass Approach

A classic method from Keshav’s “How to Read a Paper”:

Pass 1: 5–10 minutes — Get the big picture

Read: title, abstract, introduction
Scan all figure captions and conclusions
Decide: is this worth reading in depth?

Pass 2: ~1 hour — Grasp the content

Read the body, skip proofs
Focus on experimental setup and key figures
Note limitations the authors acknowledge

Pass 3: Deep read — Replication-level understanding

Derive every equation
Challenge every assumption

My Note Template

## Problem
## Key Insight
## Method
## Experiments
## My Take
## Action Items

Recommended Tools

Tool	Purpose
Zotero	Reference management
Obsidian	Connected note-taking
Connected Papers	Citation graph visualization
Semantic Scholar	AI-assisted literature search