References
[1]. Koehn, P., and Knowles, R. (2017). Six Challenges for Neural Machine Translation. In Proceedings of the NMT Workshop.
[2]. Lin, S., Hilton, J., Evans, O. (2021). TruthfulQA: Measuring How Models Mimic Human Falsehoods. ACL.
[3]. Lewis, P., Perez, E., Piktus, A., Petroni, F., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS.
[4]. Wang, X., Wei, J., Schuurmans, D., Le, Q., et al. (2023). Self-Consistency Improves Chain-of-Thought Reasoning in Language Models. ICLR.
[5]. Dhuliawala, S., Alayrac, J.-B., et al. (2024). Chain-of-Verification Reduces Hallucination in Large Language Models. Findings of ACL.
[6]. Yang, L., Yu, Z. C., Zhang, T. J. et al. (2024). Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models. NeurIPS.
[7]. Wen, J. X. et al. (2025). CodePlan: Unlocking Reasoning Potential in Large Language Models by Scaling Code-form Planning. ICLR.
[8]. Zhang, S. L., Yu, T., Feng, Y. (2024). TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space. ACL.
[9]. Izacard, G., and Grave, E. (2021). Leveraging Passage Retrieval with Generative Models for Open-Domain Question Answering (FiD). ICLR.
[10]. Glass, M., Shen, S., et al. (2022). RankRAG: Improved Retrieval-Augmented Generation with Re-ranking Mechanisms. NeurIPS.
[11]. Guu, K., Lee, K., Tung, Z., et al. (2020). REALM: Retrieval-Augmented Language Model Pre-Training. ACL.
[12]. Izacard, G., et al. (2022). Atlas: Few-shot Learning with Retrieval-Augmented Language Models. arXiv preprint.
[13]. Gao, L., et al. (2023). RARR: Retrieval-Augmented Refinement for Reducing Hallucination in Large Language Models. ACL.
[14]. Singhal, K., Tu, T., et al. (2023). Large Language Models Encode Clinical Knowledge. Nature.
[15]. Yao, S., Zhao, J., et al. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR.