Qi Li, Weining Wang, and colleagues published a comprehensive face swapping survey and the CASIA FaceSwapping benchmark on arXiv on April 27, 2026. The benchmark targets the fragmented evaluation landscape across GAN and diffusion model methods.
Key facts
- Paper submitted to arXiv on April 27, 2026.
- CASIA FaceSwapping benchmark has balanced demographic distributions.
- Survey organizes methods into five paradigms.
- Standardized protocols assess robustness across attribute variations.
- Code available at github.com/CASIA-NLPRAI/face-swapping-survey.
Face swapping research has advanced rapidly with GANs and diffusion models, but evaluation remains a mess. A new arXiv paper from CASIA researchers attempts to clean it up.
The Problem: Fragmented Evaluation

Existing methods are scattered across five paradigms, and each uses its own dataset, metrics, and protocols. [According to the arXiv preprint] this makes apples-to-apples comparison impossible. Prior surveys focused on deepfake generation or detection, not face swapping as a standalone problem.
The CASIA FaceSwapping Solution
The team introduces CASIA FaceSwapping, a benchmark designed for balanced demographic distributions and explicit attribute variations—skin tone, age, gender, facial hair. The dataset enables controlled robustness testing that prior benchmarks lacked.
Standardized protocols accompany the dataset, covering identity preservation, attribute transfer, and artifact detection. [Per the paper] extensive experiments on representative methods reveal performance characteristics and limitations that were previously obscured by inconsistent evaluation.
Why This Matters
![]()
The unique take: Benchmark fragmentation is the primary bottleneck preventing face swapping from moving from research toy to production tool. Without standardized evaluation, claims of "high fidelity" are meaningless. CASIA FaceSwapping provides the first principled framework to compare methods fairly, which could accelerate progress toward controllable, robust face swapping for applications like film production and privacy-preserving avatars.
The survey itself organizes methods into five paradigms—autoencoder-based, GAN-based, diffusion-based, 3D-aware, and hybrid—systematically analyzing design principles. This taxonomy alone is valuable for researchers navigating the field.
Limitations
The paper does not disclose the exact dataset size or number of identities in CASIA FaceSwapping. It also does not release model weights or training code, only the evaluation framework on GitHub. The benchmark's adoption will depend on community buy-in, which is uncertain given the existing fragmentation.
What to watch
Watch for community adoption of CASIA FaceSwapping over the next 6 months. If major face swapping papers (e.g., from Meta, ByteDance, or academic groups) begin citing the benchmark in evaluations, it could become the de facto standard. Also track whether the authors release model weights or training code.









