Kaomoji Fixed Translation as Knowledge Hallucination in LLMs: A Case Study on XiaoHongShu

Jinhan Feng

doi:10.54254/2755-2721/2026.TJ29099

Applied and Computational EngineeringOpen access

Kaomoji Fixed Translation as Knowledge Hallucination in LLMs: A Case Study on XiaoHongShu

Research Article

Open Access

Kaomoji Fixed Translation as Knowledge Hallucination in LLMs: A Case Study on XiaoHongShu

Jinhan Feng ^1*

¹ Nanjing Jinling High School, Nanjing, China, 210000

^*Corresponding author: fengjinhan1013@gmail.com

Published on 5 November 2025

ACE Vol.203

ISSN (Print): 2755-273X

ISSN (Online): 2755-2721

ISBN (Print): 978-1-80590-515-8

ISBN (Online): 978-1-80590-516-5

Download Cover

Abstract

Large Language Models (LLMs) produce fluent but sometimes unfounded outputs, a phenomenon commonly called hallucination. On the XiaoHongShu (RED) platform, when users input kaomoji—ASCII or Unicode emoticons—the translation tool often returns a stable, seemingly meaningful Chinese phrase even though the input lacks explicit semantic content. This paper examines why LLMs generate such fixed translations. Building on the concept of hallucination snowballing, classifications of hallucination types, and methods for reducing knowledge hallucinations, through case analysis, literature synthesis, and mechanistic review, this paper mainly discusses: (1) LLMs produce consistent translations for semantically null inputs. (2) Which hallucination category best fits this case? (3) How do prompt framing, pretraining co-occurrence patterns, and autoregressive decoding contribute? This study argues that the kaomoji fixed translation is primarily a form of knowledge hallucination reinforced by prefix consistency and statistical co-occurrence. This paper concludes by recommending uncertainty-aware behaviors, prompt-level checks, and data interventions to reduce such errors.

Keywords:

Large Language Models, knowledge hallucination, kaomoji, fixed translation, consistency bias

View PDF

References

[1]. Lü, Q., et al. (2024) Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models. Proceedings of the International Conference on Machine Learning (ICML).

[2]. Maynez, J., et al. (2020) On Faithfulness and Factuality in Abstractive Summarization. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).

[3]. Shuster, K., et al. (2021) Retrieval-Enhanced Generative Models. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).

[4]. Ji, Z., et al. (2023) Survey of Hallucination in Natural Language Generation. ACM Computing Surveys.

[5]. Zhang, M., et al. (2024) How Language Model Hallucinations Can Snowball. Proceedings of the 41st International Conference on Machine Learning (ICML).

[6]. Wu, M., et al. (2024) Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models. Proceedings of the International Conference on Machine Learning (ICML).

[7]. Holtzman, A., et al. (2020) The Curious Case of Neural Text Degeneration. Proceedings of the International Conference on Learning Representations (ICLR).

[8]. Pickering, M., & Garrod, S. (2020) Toward a Mechanistic Psychology of Dialogue. Behavioral and Brain Sciences.

[9]. Lee, N., et al. (2022) Prompt Sensitivity in Large Language Models. arXiv: 2212.10559.

[10]. Barbieri, F., et al. (2018) Modelling the Semantics of Emoji. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).

[11]. Vaswani, A., et al. (2017) Attention is All You Need. Proceedings of the Conference on Neural Information Processing Systems (NeurIPS).

[12]. Evans, O., et al. (2021) TruthfulQA: Measuring How Models Mimic Human Falsehoods. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).

[13]. Pagnoni, A., et al. (2021) Understanding Factuality in Abstractive Summarization. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).

[14]. Lin, S., et al. (2022) Teaching Models to Refuse Unknowns. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).

[15]. Zhao, Z., et al. (2023) Revisiting Chain-of-Thought Reasoning. Proceedings of the Conference on Neural Information Processing Systems (NeurIPS).

[16]. Kim, B., et al. (2023) Reducing Hallucination via Data Attribution. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).