Abstract
Face image de-occlusion and inpainting is a challenging problem in computer vision with several practical uses and is employed in many image preprocessing applications. The impressive results achieved by generative adversarial networks in image processing increased the attention of the scientific community in recent years around facial de-occlusion and inpainting. Recent network architecture developments are the two-stage networks using coarse to fine approach, landmarks, semantic segmentation map, and edge maps that guide the inpainting process. Moreover, improved convolutions enlarge the receptive field and filter the values passed to the next layer, and attention layers create relationships between local and distant information. This article presents a brief review of recent developments in GAN-based techniques for de-occlusion and inpainting of face images. In addition, it describes and analyzes network architectures and building blocks. Finally, we identify current limitations and propose directions for future research.
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
Access provided by University of Notre Dame Hesburgh Library. Download conference paper PDF
Similar content being viewed by others
1 Introduction
Facial de-occlusion and inpainting are special instances of image inpainting. They are used in the restoration of damaged images [26], removal of unwanted content [26] and data augmentation [8]. Moreover, face de-occlusion and inpainting are important preprocessing steps in many computer vision tasks with numerous applications. This is because occlusions break the entire structure of the face and hide the identity of the subject, resulting in performance degradation in many applications [37]. Occlusions degrade the performance of face parsing [30, 32], object and face detection [21] and facial expression analysis [32]. Furthermore, occlusions hide landmarks used in face alignment and frontalization [2, 41].
Face inpainting also poses several challenges that are difficult to overcome. First, the human face carries biometric information unique to each subject, revealing identity, age, sex, emotions, ethnicity, and even culture and religion. This biometric information must be preserved in the restored image. Second, there are many plausible solutions for filling the missing holes in an image, where the ground-truth is just one option. For example, given a face covered with a surgical mask, the mouth may be smiling in the ground-truth image whereas it is closed after reconstruction. Third, the set of possible solutions is restricted by the overall content, as the restoration must preserve the subject’s skin and hair texture, facial symmetry, structure and expression, along with variations of illumination and pose. Fourth, occlusions can appear anywhere in the image and may be of any shape and size. Large occlusions covering both sides off the face are more difficult to restore than small ones covering just one side. Fifth, unique facial marks such as makeup, tattoos, scars, stains, wrinkles, and accessories are difficult to recover with no reference image. Finally, the restored area must be visually consistent with the neighboring region, creating an imperceptible transition between them [1, 38, 44].
Researchers developed new methods to overcome these challenges and improve the quality of the image. The most prominent methods are GAN-based networks that are able to reconstruct an image with photo-realism. Modifications in the GAN architecture with the inclusion of new building blocks, network elements and loss functions address specific facial inpainting issues. This reviewFootnote 1 summarizes these developments, building a solid foundation for future research. The rest of the article is organized as follows. Section 2 describes network architecture and components and presents methods of training stability. Section. 3 discusses the current limitations found in the literature. Finally, we conclude in Sect. 4.
2 Theoretical Background
The network used for image inpainting consists of a number of components and elements that contribute to the final result. This section discusses the main network structures found in the literature.
2.1 Network Architecture
Generative Adversarial Networks (GAN). Generative Adversarial Networks (GAN) consist of a generator and discriminator networks [12]. The generator creates images from simple random noise, usually following a uniform or spherical Gaussian distribution [13]. The discriminator is a classifier that distinguishes between real and fake images. Both networks play an adversarial game in which the generator tries to fool the discriminator by gradually improving the image quality. They are trained in alternation until the discriminator is unable to distinguish the synthetic from real images [12]. Figure 1 shows the GAN architecture.
Original GAN architecture proposed by Goodfellow et al. [12]. The generator receives a random noise vector as input and creates fake images. The discriminator is a classifier that evaluates whether the image is real or fake.
In image inpainting and de-occlusion, the generator input is a set of occluded images instead of random noise. After training, only the generator is used to infer new images and the discriminator is removed. Figure 2 illustrates the basic GAN architecture used in image de-occlusion and inpainting. Variations of this architecture found in the literature are described in the next sections.
Two-Stage Network. Splitting the inpainting process into two or more stages improves the image quality. In this setting, each stage is responsible for a portion of the restoration process. The most common approaches are coarse-to-fine and prior information.
In the coarse-to-fine approach, the first stage creates an initial coarse prediction of the de-occluded image and the second stage takes the result of the first stage as input and refines the prediction [45]. This method gained popularity for its higher performance compared to single-stage networks [4, 9, 14, 44, 45]. Figure 3 illustrates a two-stage network with the coarse-to-fine approach.
Prior information such as landmarks, edges, or semantic segmentation maps provides spatial and structural information, guiding the inpainting process. This allows the inpainting network to build the face with realistic structure and facial expressions. In general, the prior network is trained to detect landmarks, edges or semantic segmentation and create the respective maps which are used by the inpainting network to guide the completion process. The landmark map improves the perceptual quality of the image, providing spatial consistency in unaligned faces [38]. The effect of landmarks in image inpainting is so strong that swapping the map of two persons changes their identities and face expressions [43]. The edge generator network predicts the edge map of the occlusion-free image, which is later used to guide the inpainting process [36, 42, 50]. The generator receives the masked grayscale ground-truth image, the masked edge map and the binary mask indicating the occluded area. Likewise, a parsing network creates an occlusion-free semantic segmentation map of the original occluded image, which guides the de-occlusion process [35, 46]. Alternatively, the parsing network can provide semantic regularization, where the semantic segmentation map of the generated image is compared with the ground-truth [11, 27].
2.2 Generator
In GANs, the generator is any neural network able to create the probability distribution of real data [12, 13]. Then, sampling from this distribution generates completely new images. The input to the generator can be a vector of random noise, the incomplete image, semantic segmentation map, edge map, landmarks or binary mask. In face image de-occlusion and inpainting, the generator can be an encoder-decoder, U-net, multi-branch network or any variation.
Encoder-Decoder and U-Net. An encoder-decoder is a generative model trained to reconstruct the input data in an unsupervised way [34]. The network has a symmetric architecture comprised of an encoder and a decoder. The encoder consists of a stack of down-sampling layers that compress the original data into a low dimensional representation. The decoder part contains a series of up-sampling layers that recover the original information. Optionally, a bottleneck layer can be inserted between the encoder and the decoder. This layer converts the encoder’s last layer into a vector with similar functionality as the random noise vector in the original GAN.
The encoder-decoder architecture with the bottleneck layer is appropriate for image inpainting with GANs. The encoder converts the occluded image into a vector, and the decoder reconstructs the de-occluded face from this vector.
The U-Net has a similar architecture as the encoder-decoder, where the main difference is the skip connections concatenating each encoder layer with the corresponding symmetrical decoder layer. In the original architecture [40], the U-Net encoder is a series of 3\(\times \)3 convolutions followed by ReLU and 2\(\times \)2 max pooling, while the decoder is a series of up sampling layers with 2\(\times \)2 kernel, a concatenation with the corresponding encoder layer and 3\(\times \)3 convolutions with ReLU. Figure 4 illustrates the encoder-decoder and U-Net architectures.
The encoder-decoder and U-Net architectures consist of an encoder with down sampling layers, a bottleneck layer in the middle and a decoder with up sampling layers. The U-Net has skip-connections concatenating each encoder layer with the corresponding decoder layer. Left: Encoder-decoder. Right: U-Net.
Modified versions of both encoder-decoder and U-Net are commonly used in the generator of GANs used in face de-occlusion and inpainting. The variations include adding dilated convolution [21, 25, 26, 28, 29, 35, 46, 47], SE block [21, 25, 35, 47], HDC [10] and self attention blocks [33].
2.3 Discriminator
The discriminator is a classifier that calculates the probability that the image is real rather than synthesized [12]. However, since the inpainted region is a fraction of the entire image, the discriminator is biased towards a generated image being real, resulting in poor inpainting quality. This section describes variations of discriminators that address this issue.
Local and Global Discriminators. The combination of global and local discriminators improves the reconstruction realism and consistency. The global discriminator evaluates the entire image, while the local discriminator judges a small patch around the reconstructed area. The objective function is the sum of the loss functions applied to each discriminator [27]. A less common variation combines the outputs of both discriminators and converts them into a single number representing the probability that the image is real or reconstructed. Specifically, the outputs of both discriminators concatenate and then pass through a fully connected layer. In this setting, the loss is calculated at the combined output [19]. Figure 5 shows an example of an architecture of local and global discriminators with combined outputs. The architecture is a stack of \(5\times 5\) convolutions with stride 2 followed by a fully-connected layer that outputs a 1024 vector. The concatenated output of both discriminators passes through a fully-connected layer with sigmoid activation [19].
Network architecture. It consists of one generator and two discriminators. The generator takes the occluded image as input and outputs the occlusion-free image. Two discriminators learn to distinguish the synthesized contents as real and fake. The global discriminator evaluates the entire image, while the local discriminator centers in a small area around the damaged region [19].
PatchGAN. Instead of evaluating the entire image as being real or fake like the standard discriminator, PatchGAN classifies each patch in the input image. This discriminator runs across the image like a convolution and outputs the average of all patches. PatchGAN models high-frequency details, providing texture and style lossesFootnote 2 [20]. Figure 6 shows the structure of PatchGAN.
SN-PatchGAN. SN-PatchGAN is a fully convolutional spectral-normalized Markovian discriminator. This discriminator computes the loss directly on each point of the last feature map. SN-PatchGAN was designed to inpaint images with regular and irregular shapes of any size and in multiple regions in the image. It provides faster and more stable training, replaces the global and local discriminators and dispenses the perceptual loss [44]. The original discriminator consists of a stack of layers of \(5\times 5\) convolutions with stride 2 and spectral normalization. SN-PatchGAN can be interpreted as a 3D classifier, where the loss is applied to each feature element on the feature map of the last layer, as illustrated in Fig. 7.
Fully convolutional spectral-normalized Markovian discriminator (SN-PatchGAN). The discriminator loss is applied in the last feature map, resulting in a 3D classifier [44].
2.4 Building Blocks
In the context of this paper, a block is a group of layers working together that executes a specific task. A block can be inserted between two layers in the generator. This section describes the main building blocks used in GANs, such as self-attention, residual blocks and squeeze and excitation.
Self-Attention. Convolutions process local information limited to the kernel shape and size. When the kernel is inside a hole larger than the kernel size, it captures only invalid pixels, becoming unable to hallucinate meaningful pixels. Therefore, they’re not suitable for inpainting regions larger than the kernel size. On the other hand, self-attention is a non-local mechanism that creates relationships between distant regions in the image [48]. Figure 8 shows the self-attention moduleFootnote 3.
The self-attention output is computed as follows. Let’s define C as the number of channels, N the number of feature locations from the previous layer, \(x \in \mathbb {R}^{C\times N}\) as the previous layer feature map, \(\textbf{W}_f\), \(\textbf{W}_g\), \(\textbf{W}_h \in \mathbb {R}^{\bar{C}\times C}\) and \(\textbf{W}_v \in \mathbb {R}^{C\times \bar{C}}\) as the weight matrices, and \(\bar{C} = C/8\). The feature maps \(\textbf{f}\) and \(\textbf{g}\) are calculated as \(f(x)=\textbf{W}_f x\), \(g(x)=\textbf{W}_g x\).
where the softmax \(\beta _{j,i}\) is the probability that the \(i^{th}\) location serves the \(j^{th}\) region. The output of attention layer \(\textbf{O}=(o_1,...,o_j,...,o_N)\in \mathbb {R}^{C\times N}\) is given by:
The final output is given by \(y_i=\gamma o_i + x_i\), where \(\gamma \) is a learned parameter.
Residual Block. A residual block (ResBlock) consists of a series of convolutional layers with skip connection, i.e., the input adds to the output as illustrated in Fig. 9.
The residual block avoids gradient dispersion in very deep networks [42] and replaces the standard convolution with dilated convolution [46] or multi-dilated convolution [26]. Moreover, residual networks are easy to optimize [15], train faster and achieve similar losses compared to non-residual networks [24]. The residual block was originally conceived for image classification [15].
Residual blocks are used in bottleneck layer of encoder-decoder [3, 43, 46], contraction and expansion sides of U-Net [7, 10, 26] or as a building block of multi-branch networks [31, 32].
Squeeze and Excitation Blocks. The Squeeze-and-Excitation (SE) block models the relationships between channels in the feature maps [18]. The block performs channel-wise feature re-calibration, strengthening meaningful features and weakening worthless ones. SE blocks fit between two layers, achieving higher performance gain at a small computational cost. The squeeze operation uses global average pooling to aggregate each feature map across its spatial dimension, and the excitation operation is a simple gating that produces a collection of weights that are applied to the feature maps. Figure 10 illustrates the architecture of the SE block.
2.5 Training Stability
This section presents two approaches to stabilize the training of GANs. Zhang et al. proposed the use of spectral normalization on both the generator and discriminator, as well as employing the two time scale update rule (TTUR) [48].
Two Time Scale Update Rule (TTUR). Using different learning rates for the generator and discriminator in combination with the Adam stochastic optimization improves convergence and stability. In the two time scale update rule (TTUR), the learning rate of the generator is generally lower than the discriminator. Although the TTUR theory ensures convergence, the appropriate learning rates must be empirically found for each network [16]. The learning rates found in the literature for the generator is 1e-4 and for the discriminator are 1e-12 [22, 23], 1e-4 [11] and 4e-4 [5, 21, 48].
Spectral Normalization. Spectral normalization is a weight-normalization technique originally proposed to stabilize the training of the discriminator [39]. Spectral normalization is simple to implement, has low computation cost, and further improves stability when applied in combination with gradient penalty.
Furthermore, when employed in the generator and discriminator, spectral normalization further reduces the discriminator to generator update ratio, decreases the computational cost, and provides more stable training [48]. The spectral normalization is given by Eq. 3:
where \(\eta (\textbf{W})\) is the spectral norm of the matrix \(\textbf{W}\).
3 Limitations
Despite the impressive progress in image de-occlusion and inpainting over the recent years, several challenges remain to be solved. Given the broad extent of current limitations, it’s hard to imagine that a single solution will be able to address all situations. This section analyzes key limitations identified during the review and proposes open areas for research.
3.1 Datasets
Model research still requires large amounts of training data, heavily relying on available datasets, and the model generalization capabilities on unseen data largely depends on the trained dataset. An open research area is the development of models, algorithms, and methods resilient to data availability, i.e., models with high generalization capability using few training data.
Moreover, despite the variety in available data, there are few datasets created for face de-occlusion and inpainting. These datasets contain few images compared with other face databases. For this reason, researchers build their own synthetic images based on available public face datasets, usually overlaying an object or a binary mask. This approach may be good for model development, not for inference in real world scenarios which require a large occluded face dataset for testing models.
3.2 Evaluation Metrics
User study measures qualitative attributes that are hard to evaluate with quantitative methods alone. Since researchers employ different methodologies when conducting the qualitative survey, results cannot be compared across published studies. This situation could be avoided if researchers followed a formal protocol describing the survey process. The protocol might use psychophysical similarity measurements already used in the literature, such as Two Alternative Forced Choice (2AFC) and Just Noticeable Differences (JND) used in [49].
Moreover, most quantitative evaluation metrics measure pixel-level statistics that are unable to capture human perception. For historical reasons, they are still widely used for model comparison. The two most used metrics, PSNR and SSIM, carry a simple relationship between them [17]. On the other hand, feature-level metrics capture higher level perceptual quality. LPIPS is the only feature-level metric found in the literature, but it still lags behind human-level perception. More research still needs to be done in an improved version of LPIPS with higher perception level as well as quantitative metrics able to evaluate other qualitative attributes such as effective occlusion removal, naturalness, image realism, consistency, and perception quality.
3.3 Automatic De-occlusion
Most state-of-the-art models require a binary mask with holes in the occluded area. This can be useful for single image restoration and de-occlusion of photographs, but it is unfeasible for videos, real-time de-occlusion and batch processing of several images. Automatic detection and removal of occlusions fails to properly detect occlusions, producing artifacts in the restored region [6]. Moreover, there are still few studies in this area.
3.4 Image Quality
Image quality remains an open problem, for example eyes with different colors, distorted shape of mouth and nose, missing ears, texture discontinuity in the border pixels, artifacts, blur, and bad background filling.
Models using prior information such as landmarks, edges and semantic segmentation maps, or other coarse-to-fine approach, rely on the quality of the predictions of these priors. These predictions have performance degradation on occluded faces, in particular combined with large pose variations, such as top or bottom views and profile.
Large occlusions also degrade performance. The majority of models are trained with 25% of missing region, and a few restore over than 50%. It’s specially challenging to remove occlusions covering symmetric parts of the face, for example both eyes, simply because the model doesn’t know the color and shape of the eyes. In such cases, the use of prior information helps with the structure of the face, but the texture still remains missing.
3.5 Computational Cost
Current models have high computational cost, restricting the use in edge devices and in real-time applications. Inference time is still very high for real-time applications even with GPU, and the number of parameters may restrict the use in edge devices. Moreover, training a model takes a few days and model design takes a few months. More research is needed to create better algorithms and more efficient methods to reduce these computational costs.
4 Conclusion
This paper reviews GAN-based face image inpainting and de-occlusion studies found in the literature. More specifically, we explored the network architecture and components. The review also analyzed the limitations.
The GAN architecture for image inpainting and two-stage networks were described. Encoder-decoder and U-Net are basic generator architectures that can be combined with other components for additional functionalities. They are also used in single and multi-stage architectures. Local and global discriminators, PatchGAN and SN-PatchGAN improve the GAN’s ability to distinguish between real and fake images in local and global levels. Squeeze and excitation blocks perform channel-wise feature re-calibration, weighting the importance of each feature map in a given layer. TTUR, Adam optimizer and spectral normalization accelerate and stabilize GAN training.
Finally, this study discussed the current limitations and challenges found in datasets, evaluation metrics, automatic de-occlusion, image quality and computational cost. Furthermore, we propose insights for future research.
Notes
- 1.
See protocol at https://github.com/vivamoto/bracis-2023.
- 2.
Python code available at https://github.com/znxlwm/pytorch-pix2pix/blob/3059f2af53324e77089bbcfc31279f01a38c40b8/network.py.
- 3.
Python code is available at https://github.com/brain-research/self-attention-gan.
References
Bertalmio, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 2000, pp. 417–424. ACM Press/Addison-Wesley Publishing Co., USA (2000)
Burgos-Artizzu, X.P., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: 2013 IEEE International Conference on Computer Vision, pp. 1513–1520 (2013)
Cai, J., Han, H., Cui, J., Chen, J., Liu, L., Kevin Zhou, S.: Semi-supervised natural face de-occlusion. IEEE Trans. Inf. Forensics Secur. 16, 1044–1057 (2021)
Cao, S., Sakurai, K.: Face completion with pyramid semantic attention and latent codes. In: Proceedings - 2020 8th International Symposium on Computing and Networking, CANDAR 2020, pp. 1–8. Institute of Electrical and Electronics Engineers Inc. (2020)
Chen, M., Liu, Z., Ye, L., Wang, Y.: Attentional coarse-and-fine generative adversarial networks for image inpainting. Neurocomputing 405, 259–269 (2020)
Chen, Y.A., Chen, W.C., Wei, C.P., Wang, Y.C.: Occlusion-aware face inpainting via generative adversarial networks. In: Proceedings - International Conference on Image Processing, ICIP, vol. 2017-September, pp. 1202–1206. IEEE Computer Society (2018)
Cheung, Y.M., Li, M., Zou, R.: Facial structure guided GAN for identity-preserved face image de-occlusion. In: ICMR 2021 - Proceedings of the 2021 International Conference on Multimedia Retrieval, pp. 46–54. Association for Computing Machinery, Inc. (2021)
Din, N., Javed, K., Bae, S., Yi, J.: Effective removal of user-selected foreground object from facial images using a novel GAN-based network. IEEE Access 8, 109648–109661 (2020)
Dong, J., Zhang, L., Zhang, H., Liu, W.: Occlusion-aware GAN for face de-occlusion in the wild. In: Proceedings - IEEE International Conference on Multimedia and Expo, vol. 2020-July. IEEE Computer Society (2020)
Fang, Y., Li, Y., Tu, X., Tan, T., Wang, X.: Face completion with hybrid dilated convolution. Signal Process. Image Commun. 80, 115664 (2020)
Ge, S., Li, C., Zhao, S., Zeng, D.: Occluded face recognition in the wild by identity-diversity inpainting. IEEE Trans. Circuits Syst. Video Technol. 30(10), 3387–3397 (2020)
Goodfellow, I.J., et al.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS 2014, vol. 2, pp. 2672–2680. MIT Press, Cambridge (2014)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of Wasserstein GANs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 5769–5779. Curran Associates Inc., Red Hook (2017)
Guo, D., Feng, J., Zhou, B.: Structure-aware image expansion with global attention. In: SIGGRAPH Asia 2019 Technical Briefs, SA 2019, pp. 13–16. Association for Computing Machinery, Inc. (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Horé, A., Ziou, D.: Image quality metrics: PSNR vs. SSIM. In: 2010 20th International Conference on Pattern Recognition, pp. 2366–2369 (2010)
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020)
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. 36(4), 1–14 (2017)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976 (2017)
Jabbar, A., Li, X., Iqbal, M., Malik, A.: FD-stackGAN: face de-occlusion using stacked generative adversarial networks. KSII Trans. Internet Inf. Syst. 15(7), 2547–2567 (2021)
Jam, J., Kendrick, C., Drouard, V., Walker, K., Hsu, G.S., Yap, M.: R-MNet: a perceptual adversarial network for image inpainting. In: Proceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021, pp. 2713–2722. Institute of Electrical and Electronics Engineers Inc. (2021)
Jam, J., Kendrick, C., Drouard, V., Walker, K., Hsu, G.S., Yap, M.: Symmetric skip connection Wasserstein GAN for high-resolution facial image inpainting. In: VISIGRAPP 2021 - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, vol. 4, pp. 35–44. SciTePress (2021)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Khan, M., Ud Din, N., Bae, S., Yi, J.: Interactive removal of microphone object in facial images. Electronics 8(10), 1115 (2019)
Li, X., Hu, G., Zhu, J., Zuo, W., Wang, M., Zhang, L.: Learning symmetry consistent deep CNNs for face completion. IEEE Trans. Image Process. 29, 7641–7655 (2020)
Li, Y., Liu, S., Yang, J., Yang, M.H.: Generative face completion. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5892–5900 (2017)
Li, Z., Zhu, H., Cao, L., Jiao, L., Zhong, Y., Ma, A.: Face inpainting via nested generative adversarial networks. IEEE Access 7, 155462–155471 (2019)
Lie, Y., Li, L.: Image inpainting using multi-scale neural network and shift-net. In: Proceedings - 2020 7th International Conference on Information Science and Control Engineering, ICISCE 2020, pp. 704–709. Institute of Electrical and Electronics Engineers Inc. (2020)
Lin, J., Yang, H., Chen, D., Zeng, M., Wen, F., Yuan, L.: Face parsing with ROI tanh-warping. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5647–5656 (2019)
Liu, J., Jung, C.: Facial image inpainting using multi-level generative network. In: Proceedings - IEEE International Conference on Multimedia and Expo, vol. 2019-July, pp. 1168–1173. IEEE Computer Society (2019)
Liu, J., Jung, C.: Facial image inpainting using attention-based multi-level generative network. Neurocomputing 437, 95–106 (2021)
Luo, X., He, X., Qing, L., Chen, X., Liu, L., Xu, Y.: Eyesgan: synthesize human face from human eyes. Neurocomputing 404, 213–226 (2020)
Maggipinto, M., Masiero, C., Beghi, A., Susto, G.A.: A convolutional autoencoder approach for feature extraction in virtual metrology. Procedia Manufacturing 17, 126–133 (2018). 28th International Conference on Flexible Automation and Intelligent Manufacturing (FAIM2018), 11–14 June 2018, Columbus, OH, USAGlobal Integration of Intelligent Manufacturing and Smart Industry for Good of Humanity
Maharjan, R., Ud Din, N., Yi, J.: Image-to-image translation based face de-occlusion. In: Jiang X., F.H. (ed.) Proceedings of SPIE - The International Society for Optical Engineering, vol. 11519. SPIE (2020)
Maheshwari, U., Turlapati, V., Kiruthika, U.: Lucid-GAN: an adversarial network for enhanced image inpainting. In: CIVEMSA 2021 - IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications, Proceedings. Institute of Electrical and Electronics Engineers Inc. (2021)
Mathai, J., Masi, I., Abdalmageed, W.: Does generative face completion help face recognition? In: 2019 International Conference on Biometrics, ICB 2019. Institute of Electrical and Electronics Engineers Inc. (2019)
Maulana, A., Fatichah, C., Suciati, N.: Facial inpainting using generative adversarial network with feature reconstruction and landmark loss to preserve spatial consistency in unaligned face images. Int. J. Intell. Eng. Syst. 13(6), 219–228 (2020)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks (2018)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Sadiq, M., Shi, D.: Attentive occlusion-adaptive deep network for facial landmark detection. Pattern Recognit. 125, 108510 (2022)
Wang, F., Li, W., Liu, Y., Gong, Y., Gao, Z., Lu, J.: Face inpainting combining structured forest edge information and gated convolution. In: Proceedings - 2021 3rd International Conference on Natural Language Processing, ICNLP 2021, pp. 213–217. Institute of Electrical and Electronics Engineers Inc. (2021)
Wu, Y., Singh, V., Kapoor, A.: From image to video face inpainting: spatial-temporal nested GAN (STN-GAN) for usability recovery. In: Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020, pp. 2385–2394. Institute of Electrical and Electronics Engineers Inc. (2020)
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.: Free-form image inpainting with gated convolution. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2019-October, pp. 4470–4479. Institute of Electrical and Electronics Engineers Inc. (2019)
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.: Generative image inpainting with contextual attention. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 5505–5514. IEEE Computer Society (2018)
Yu, L., Zhu, D., He, J.: Semantic segmentation guided face inpainting based on SN-PatchGAN. In: Proceedings - 2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, CISP-BMEI 2020, pp. 110–115. Institute of Electrical and Electronics Engineers Inc. (2020)
Zhang, H., Li, T.: Semantic face image inpainting based on generative adversarial network. In: Proceedings - 2020 35th Youth Academic Annual Conference of Chinese Association of Automation, YAC 2020, pp. 530–535. Institute of Electrical and Electronics Engineers Inc. (2020)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 7354–7363. PMLR (2019)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Zhu, W., Wang, X., Wu, Y., Zou, G.: A face occlusion removal and privacy protection method for IoT devices based on generative adversarial networks. Wirel. Commun. Mob. Comput. 2021 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ivamoto, V., Simões, R., Kemmer, B., Lima, C. (2023). Occluded Face In-painting Using Generative Adversarial Networks—A Review. In: Naldi, M.C., Bianchi, R.A.C. (eds) Intelligent Systems. BRACIS 2023. Lecture Notes in Computer Science(), vol 14196. Springer, Cham. https://doi.org/10.1007/978-3-031-45389-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-45389-2_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45388-5
Online ISBN: 978-3-031-45389-2
eBook Packages: Computer ScienceComputer Science (R0)









