References
[1]. Liu, C., Lin, Q., Zeng, Z.. (2024). Emoface: Audio-driven emotional 3D face animation. In 2024 IEEE Conference on Virtual Reality and 3D User Interfaces (VR) (pp. 387–397). IEEE.
[2]. Chen, Y., Liang, S., Zhou, Z. (2025). HunyuanVideo-Avatar: High-fidelity audio-driven human animation for multiple characters. arXiv preprint arXiv: 2505.20156.
[3]. Huang, Y., Wang, J., Zeng, A. (2023). Dreamwaltz: Make a scene with complex 3D animatable avatars. Advances in Neural Information Processing Systems, 36, 4566–4584.
[4]. Drobyshev, N., Casademunt, A. B., Vougioukas, K. (2024). Emoportraits: Emotion-enhanced multimodal one-shot head avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8498–8507).
[5]. Fei, H., Zhang, H., Wang, B. (2024). Empathyear: An open-source avatar multimodal empathetic chatbot. arXiv preprint arXiv: 2406.15177.
[6]. Wang, H., Weng, Y., Li, Y. (2025). Emotivetalk: Expressive talking head generation through audio information decoupling and emotional video diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 26212–26221).
[7]. Zhen, R., Song, W., He, Q. (2023). Human–computer interaction system: A survey of talking-head generation. Electronics, 12(1), 218.
[8]. Arcelin, B., & Chaverou, N. (2024). Audio2Rig: Artist-oriented deep learning tool for facial and lip sync animation. In ACM SIGGRAPH 2024 Talks (pp. 1–2). ACM.
[9]. Peng, Z., Hu, W., Ma, J. (2025). SyncTalk++: High-fidelity and efficient synchronized talking heads synthesis using Gaussian splatting. arXiv preprint arXiv: 2506.14742.