Resolving Facial Image Defects Through the Integration of Segmentation and Fusion Networks
Research Article
Open Access
CC BY

Resolving Facial Image Defects Through the Integration of Segmentation and Fusion Networks

Zishuo Xia 1* Yin Ru 2, Yilei Yang 3, Kaiwen Xian 4
1 School of Information Science and Technology, Guangdong University of Foreign Studies
2 School of Computer Science, Gonzaga University, Spokane, USA
3 Aberdeen School of Data Science and Artificial Intelligence, South China Normal University, Foshan, China
4 Shandong Experimental High School, Jinan, China
*Corresponding author: 954643526@qq.com
Published on 20 July 2025
Volume Cover
ACE Vol.177
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-80590-241-6
ISBN (Online): 978-1-80590-242-3
Download Cover

Abstract

Generating images with dynamic expressions has become a key component of the social media industry. To enhance the realism, it is crucial to remove the potential distortion in the face image. Unfortunately, videos produced by PIrenderer often exhibit facial blur due to inadequate segmentation between foreground and background. In this paper, GrabCut and MODNet are used to post-process the videos generated by PIrenderer and then fuse them. Our proposed method can guarantee to reduce the influence of background on the face when generating dynamic expression videos. These post-processing steps optimize dynamic facial expression rendering, mitigate the face distortion problem, and ultimately produce more realistic video output.

Keywords:

face-modeling, image generating, GrabCut, ModNet, postprocessing.

View PDF
Xia,Z.;Ru,Y.;Yang,Y.;Xian,K. (2025). Resolving Facial Image Defects Through the Integration of Segmentation and Fusion Networks. Applied and Computational Engineering,177,38-47.

References

[1]. Aaron Hertzmann et al. “Image analogies”. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. SIGGRAPH ’01. New York, NY, USA: Association for Computing Machinery, 2001, pp. 327–340. isbn: 158113374X. doi: 10.1145/383259.383295. url: https: //doi.org/10.1145/383259.383295.

[2]. Lei Zhu et al. “Joint Bi-Layer Optimization for Single-Image Rain Streak Removal”. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Oct. 2017.

[3]. Aayush Bansal et al. “Recycle-GAN: Unsupervised Video Retargeting”. In: European Conference on Computer Vision. 2018. url: https: //api. semanticscholar.org/CorpusID: 51987197.

[4]. Olivia Wiles, A. Sophia Koepke, and Andrew Zisserman. “X2Face: A network for controlling face generation by using images, audio, and pose codes”. In: ArXiv abs/1807.10550 (2018). url: https: //api.semanticscholar. org/CorpusID: 51866642.

[5]. Ayush Tewari et al. “StyleRig: Rigging StyleGAN for 3D Control over Portrait Images, CVPR 2020”. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. June 2020.

[6]. Yurui Ren et al. “PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering”. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), pp. 13739–13748. url: https: //api.semanticscholar.org/CorpusID: 237562793.

[7]. Lee Chae-Yeon et al. “Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics”. In: 2025. url: https: //api.semanticscholar.org/CorpusID: 277345403.

[8]. Aliaksandr Siarohin et al. “First Order Motion Model for Image Animation”. In: Neural Information Processing Systems. 2020. url: https: //api.semanticscholar.org/CorpusID: 202767986.

[9]. Sicheng Xu et al. “Vasa-1: Lifelike audio-driven talking faces generated in real time”. In: Advances in Neural Information Processing Systems 37 (2024), pp. 660–684.

[10]. Ai-Mei Huang, Zhewei Huang, and Shuchang Zhou. “Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer”. In: Proceedings of the 30th ACM International Conference on Multimedia (2022). url: https: //api.semanticscholar.org/CorpusID: 250072874.

[11]. Zhongcong Xu et al. “MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 2024, pp. 1481–1490.

[12]. Youxin Pang et al. “DPE: Disentanglement of Pose and Expression for General Video Portrait Editing”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 2023, pp. 427–436.

[13]. Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. “GrabCut”: interactive foreground extraction using iterated graph cuts”. In: ACM Trans. Graph. 23.3 (Aug. 2004), pp. 309–314. issn: 0730-0301. doi: 10.1145/1015706.1015720. url: https: //doi.org/10.1145/1015706. 1015720.

[14]. Zhanghan Ke et al. “MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition”. In: AAAI Conference on Artificial Intelligence. 2020. url: https: //api.semanticscholar.org/CorpusID: 246295022.

[15]. Department of Computer Science Rhodes University. Research Project G02M1682. https: //www.cs.ru.ac.za/research/g02m1682/. Accessed: 2025-04-08.

[16]. Richard Zhang et al. “The unreasonable effectiveness of deep features as a perceptual metric”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, pp. 586–595.

Cite this article

Xia,Z.;Ru,Y.;Yang,Y.;Xian,K. (2025). Resolving Facial Image Defects Through the Integration of Segmentation and Fusion Networks. Applied and Computational Engineering,177,38-47.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-MLA 2025 Symposium: Applied Artificial Intelligence Research

ISBN: 978-1-80590-241-6(Print) / 978-1-80590-242-3(Online)
Editor: Hisham AbouGrad
Conference website: https://2025.confmla.org/
Conference date: 3 September 2025
Series: Applied and Computational Engineering
Volume number: Vol.177
ISSN: 2755-2721(Print) / 2755-273X(Online)