Blind Face Restoration Survey

Authors: S. Sharipov, B. Nutfullin, N. Maloyan
Published: International Journal of Open Information Technologies, 2023
Face Restoration Image Processing Survey

Abstract

This survey provides a comprehensive review of deep learning methods for blind face restoration—recovering high-quality face images from degraded inputs without knowing the degradation type. We analyze GAN-based, diffusion-based, and hybrid approaches.

Background

Face images captured in real-world conditions frequently suffer from a variety of degradations: low resolution from distant or low-quality cameras, compression artifacts from lossy storage and transmission, motion blur, noise from poor lighting, and combinations of all of the above. Blind face restoration refers to the task of recovering a high-quality face image from such a degraded input without explicit knowledge of what degradation was applied. This "blind" setting is significantly more challenging than non-blind restoration, where the degradation model is known, because the network must simultaneously infer both the degradation and the clean image.

The problem has gained considerable attention due to its practical relevance across numerous domains. Surveillance and forensic applications often need to enhance faces captured by low-quality security cameras. Photo restoration and archival digitization require recovering detail from old, damaged, or low-resolution photographs. Social media platforms and video conferencing tools employ face enhancement to improve visual quality under bandwidth constraints. Beyond these direct applications, face restoration serves as a preprocessing step for downstream tasks such as face recognition, expression analysis, and identity verification, where input quality directly affects system performance.

The field has evolved rapidly over the past several years, driven by advances in generative modeling. Early approaches based on simple convolutional networks produced overly smooth results that lacked fine detail. The introduction of generative adversarial networks (GANs) brought a qualitative leap in restoration quality, and more recently, diffusion-based generative models have emerged as a competitive alternative. This survey aims to provide a structured overview of these developments, categorizing methods by their underlying generative framework and analyzing their respective strengths and limitations.

Methodology

Our survey systematically reviewed the literature on deep learning-based blind face restoration, covering publications from major computer vision and machine learning venues through early 2023. We organized the methods into several broad categories based on their architectural approach. The first and most extensively studied category encompasses GAN-based methods, which leverage adversarial training to produce sharp, realistic outputs. Within this group, we distinguish between approaches that train task-specific generators from scratch and those that exploit pretrained generative models -- particularly StyleGAN and its variants -- as learned priors for the face manifold.

Methods leveraging pretrained GAN priors, such as GFPGAN, GFP-GAN, GPEN, and CodeFormer, represent a particularly influential line of work. These approaches encode the degraded input into the latent space of a pretrained face generator and decode it into a high-quality output, effectively using the generator's learned knowledge of face structure and texture to fill in missing or corrupted details. CodeFormer introduced a discrete codebook approach with a controllable fidelity-quality tradeoff, allowing users to balance between faithfulness to the input and the perceptual quality of the output. We analyze the trade-offs inherent in these design choices, including the tension between identity preservation and visual quality.

The survey also covers the emerging category of diffusion model-based restoration methods, which replace the adversarial training framework with iterative denoising processes. These methods have shown promise in producing diverse, high-quality outputs and avoiding some of the training instability issues associated with GANs. Additionally, we review hybrid approaches that combine elements of different frameworks, as well as methods that incorporate geometric face priors such as facial landmarks, parsing maps, and 3D morphable models to guide the restoration process. For each category, we analyze representative methods on standard benchmarks using both distortion metrics (PSNR, SSIM) and perceptual metrics (FID, LPIPS).

Results

Our comparative analysis revealed several key trends in the field. GAN-prior-based methods consistently achieved the best perceptual quality scores (lowest FID and LPIPS), producing outputs that appeared sharp and realistic to human observers. However, these methods sometimes hallucinated facial details -- generating plausible but incorrect features, particularly when the input was severely degraded. This identity drift is a fundamental concern for applications in forensics and surveillance, where fidelity to the original identity is paramount.

Methods incorporating explicit face structure priors (landmarks, parsing maps, 3D models) showed improved robustness on heavily degraded inputs, as the geometric guidance helped constrain the restoration to anatomically plausible configurations. However, these approaches added computational overhead and could propagate errors when the prior estimation itself failed on low-quality inputs. Diffusion-based methods demonstrated competitive quality with greater output diversity, but at significantly higher computational cost due to the iterative sampling process, making them less suitable for real-time or high-throughput applications at the time of our survey.

Across all methods, we observed a persistent tension between distortion metrics and perceptual metrics. Approaches optimizing pixel-level fidelity (high PSNR, SSIM) tended to produce blurry results, while those optimizing perceptual quality (low FID, LPIPS) could diverge from the ground truth in pixel space. This distortion-perception tradeoff is a well-known phenomenon in image restoration, and it has particular implications for face restoration where both sharpness and identity preservation matter. The choice of evaluation protocol significantly influenced which methods appeared superior, underscoring the importance of multi-metric assessment in this field.

Discussion

The rapid progress in blind face restoration reflects broader advances in generative modeling, and several open challenges remain. Generalization to real-world degradations -- as opposed to synthetically degraded benchmarks -- continues to be difficult, as real images exhibit complex, spatially varying degradation patterns that synthetic pipelines do not fully capture. Identity preservation under severe degradation remains an unsolved problem, with current methods often producing outputs that look like a plausible face but not necessarily the correct individual. Evaluation methodology is also an active area of discussion, as no single metric adequately captures all aspects of restoration quality.

Looking forward, we identify several promising research directions. The integration of diffusion models with face-specific priors could combine the quality advantages of iterative generation with the structural constraints that prevent identity drift. Video face restoration, which adds temporal consistency constraints, is a natural and practically important extension that remains underexplored. Finally, the development of standardized real-world benchmarks -- with genuinely degraded images rather than synthetic approximations -- would provide a more meaningful basis for comparing methods and tracking progress toward practical deployment.

Topics Covered

Related Topics

Medical Imaging AI · Computer Vision

📄 Access: Google Scholar