InViT: GAN Inversion-Based Vision Transformer for Blind Image Inpainting

Yongqiang Du; Haoran Liu; Shengjie He; Songnan Chen

doi:10.1109/access.2024.3447055

ScienceGate Book Chapters

JOURNAL ARTICLE

InViT: GAN Inversion-Based Vision Transformer for Blind Image Inpainting

Yongqiang Du Haoran Liu Shengjie He Songnan Chen

Year: 2024 Journal: IEEE Access Vol: 12 Pages: 129956-129965 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2024.3447055

Get Full-Text PDF Get Analytical Report

Abstract

Blind image inpainting, the task of detecting corrupted regions with diverse patterns within an image and then generating plausible content for the corrupted regions, remains a both challenging and practical problem in computer vision. In this paper, we propose a novel model InViT for blind image inpainting, which leverages a combination of a pre-trained Generative Adversarial Network (GAN) and a learnable Vision Transformer (ViT). The proposed InViT mainly consists of two phases, the mask prediction phase and the image inpainting phase. Benefiting from the learned latent feature space from the full training data through GAN inversion, a pre-trained StyleGAN is able to provide reliable cues of corrupted regions for mask prediction. By further incorporating the predicted mask into the image inpainting phase, we design a vision Transformer with the mask-aware self-attention mechanism to capture long-range dependencies between pixels during content reconstruction. Besides, we propose a Prompt-augment Contextual Aggregation module to strengthen the reasonableness of generated content for the corrupted regions. Extensive experiments on several benchmark datasets for blind image inpainting demonstrate that our InViT model achieves state-of-the-art performance compared to existing methods in terms of both quantitative metrics and qualitative visual quality.

Keywords:

Computer vision Artificial intelligence Inpainting Computer science Inversion (geology) Transformer Pattern recognition (psychology) Image (mathematics) Geology Electrical engineering Engineering Voltage

Metrics

Cited By

2.12

FWCI (Field Weighted Citation Impact)

Refs

0.80

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Image and Signal Denoising Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image Processing Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

InViT: GAN Inversion-Based Vision Transformer for Blind Image Inpainting

Abstract

Metrics

Citation History

Topics

Related Documents

Vision Transformer-Based Image Inpainting Method

Multiscale Window Based Vision Transformer for Image Inpainting

Decontamination Transformer For Blind Image Inpainting

Wavelet frame based blind image inpainting

Depth Inpainting via Vision Transformer