Resolving Facial Image Defects Through the Integration of Segmentation and Fusion Networks

Zishuo Xia; Yin Ru; Yilei Yang; Kaiwen Xian

doi:10.54254/2755-2721/2025.BJ25236

Applied and Computational EngineeringOpen access

Resolving Facial Image Defects Through the Integration of Segmentation and Fusion Networks

Research Article

Open Access

Resolving Facial Image Defects Through the Integration of Segmentation and Fusion Networks

Zishuo Xia ^1* Yin Ru ², Yilei Yang ³, Kaiwen Xian ⁴

¹ School of Information Science and Technology, Guangdong University of Foreign Studies

² School of Computer Science, Gonzaga University, Spokane, USA

³ Aberdeen School of Data Science and Artificial Intelligence, South China Normal University, Foshan, China

⁴ Shandong Experimental High School, Jinan, China

^*Corresponding author: 954643526@qq.com

Published on 20 July 2025

ACE Vol.177

ISSN (Print): 2755-273X

ISSN (Online): 2755-2721

ISBN (Print): 978-1-80590-241-6

ISBN (Online): 978-1-80590-242-3

Download Cover

Abstract

Generating images with dynamic expressions has become a key component of the social media industry. To enhance the realism, it is crucial to remove the potential distortion in the face image. Unfortunately, videos produced by PIrenderer often exhibit facial blur due to inadequate segmentation between foreground and background. In this paper, GrabCut and MODNet are used to post-process the videos generated by PIrenderer and then fuse them. Our proposed method can guarantee to reduce the influence of background on the face when generating dynamic expression videos. These post-processing steps optimize dynamic facial expression rendering, mitigate the face distortion problem, and ultimately produce more realistic video output.

Keywords:

face-modeling, image generating, GrabCut, ModNet, postprocessing.

View PDF

References

[1]. Aaron Hertzmann et al. “Image analogies”. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. SIGGRAPH ’01. New York, NY, USA: Association for Computing Machinery, 2001, pp. 327–340. isbn: 158113374X. doi: 10.1145/383259.383295. url: https: //doi.org/10.1145/383259.383295.

[2]. Lei Zhu et al. “Joint Bi-Layer Optimization for Single-Image Rain Streak Removal”. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Oct. 2017.

[3]. Aayush Bansal et al. “Recycle-GAN: Unsupervised Video Retargeting”. In: European Conference on Computer Vision. 2018. url: https: //api. semanticscholar.org/CorpusID: 51987197.

[4]. Olivia Wiles, A. Sophia Koepke, and Andrew Zisserman. “X2Face: A network for controlling face generation by using images, audio, and pose codes”. In: ArXiv abs/1807.10550 (2018). url: https: //api.semanticscholar. org/CorpusID: 51866642.

[5]. Ayush Tewari et al. “StyleRig: Rigging StyleGAN for 3D Control over Portrait Images, CVPR 2020”. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. June 2020.

[6]. Yurui Ren et al. “PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering”. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), pp. 13739–13748. url: https: //api.semanticscholar.org/CorpusID: 237562793.

[7]. Lee Chae-Yeon et al. “Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics”. In: 2025. url: https: //api.semanticscholar.org/CorpusID: 277345403.

[8]. Aliaksandr Siarohin et al. “First Order Motion Model for Image Animation”. In: Neural Information Processing Systems. 2020. url: https: //api.semanticscholar.org/CorpusID: 202767986.

[9]. Sicheng Xu et al. “Vasa-1: Lifelike audio-driven talking faces generated in real time”. In: Advances in Neural Information Processing Systems 37 (2024), pp. 660–684.

[10]. Ai-Mei Huang, Zhewei Huang, and Shuchang Zhou. “Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer”. In: Proceedings of the 30th ACM International Conference on Multimedia (2022). url: https: //api.semanticscholar.org/CorpusID: 250072874.

[11]. Zhongcong Xu et al. “MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 2024, pp. 1481–1490.

[12]. Youxin Pang et al. “DPE: Disentanglement of Pose and Expression for General Video Portrait Editing”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 2023, pp. 427–436.

[13]. Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. “GrabCut”: interactive foreground extraction using iterated graph cuts”. In: ACM Trans. Graph. 23.3 (Aug. 2004), pp. 309–314. issn: 0730-0301. doi: 10.1145/1015706.1015720. url: https: //doi.org/10.1145/1015706. 1015720.

[14]. Zhanghan Ke et al. “MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition”. In: AAAI Conference on Artificial Intelligence. 2020. url: https: //api.semanticscholar.org/CorpusID: 246295022.

[15]. Department of Computer Science Rhodes University. Research Project G02M1682. https: //www.cs.ru.ac.za/research/g02m1682/. Accessed: 2025-04-08.

[16]. Richard Zhang et al. “The unreasonable effectiveness of deep features as a perceptual metric”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, pp. 586–595.

References

[2]. Lei Zhu et al. “Joint Bi-Layer Optimization for Single-Image Rain Streak Removal”. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Oct. 2017.

[3]. Aayush Bansal et al. “Recycle-GAN: Unsupervised Video Retargeting”. In: European Conference on Computer Vision. 2018. url: https: //api. semanticscholar.org/CorpusID: 51987197.

[5]. Ayush Tewari et al. “StyleRig: Rigging StyleGAN for 3D Control over Portrait Images, CVPR 2020”. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. June 2020.

[8]. Aliaksandr Siarohin et al. “First Order Motion Model for Image Animation”. In: Neural Information Processing Systems. 2020. url: https: //api.semanticscholar.org/CorpusID: 202767986.

[9]. Sicheng Xu et al. “Vasa-1: Lifelike audio-driven talking faces generated in real time”. In: Advances in Neural Information Processing Systems 37 (2024), pp. 660–684.

[15]. Department of Computer Science Rhodes University. Research Project G02M1682. https: //www.cs.ru.ac.za/research/g02m1682/. Accessed: 2025-04-08.

Cite this article

Xia,Z.;Ru,Y.;Yang,Y.;Xian,K. (2025). Resolving Facial Image Defects Through the Integration of Segmentation and Fusion Networks. Applied and Computational Engineering,177,38-47.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-MLA 2025 Symposium: Applied Artificial Intelligence Research

ISBN: 978-1-80590-241-6(Print) / 978-1-80590-242-3(Online)

Editor: Hisham AbouGrad

Conference website: https://2025.confmla.org/

Conference date: 3 September 2025

Series: Applied and Computational Engineering

Volume number: Vol.177

ISSN: 2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:

1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.

2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).