Quality Prediction of RAG System Retrieval Based on Machine Learning Algorithms

Chaoyi Yu

doi:10.54254/2755-2721/2025.27800

Applied and Computational EngineeringOpen access

Quality Prediction of RAG System Retrieval Based on Machine Learning Algorithms

Research Article

Open Access

Quality Prediction of RAG System Retrieval Based on Machine Learning Algorithms

Chaoyi Yu ^1*

¹ School of Computer Science and Engineering, University of Electronic Science and Technology of China, 611731, China

^*Corresponding author: 2387663175@qq.com

Published on 14 October 2025

ACE Vol.193

ISSN (Print): 2755-273X

ISSN (Online): 2755-2721

ISBN (Print): 978-1-80590-239-3

ISBN (Online): 978-1-80590-240-9

Download Cover

Abstract

The Retrieval Enhanced Generation (RAG) system improves the accuracy and reliability of content generation by retrieving external knowledge, and has been widely used in fields such as intelligent question answering and knowledge assistants. However, its core performance depends on the quality of the retrieval stage, and the relevance and factual consistency of the retrieval results directly determine the effectiveness of the generated content. However, factors such as query complexity, document noise, and domain differences in real-world scenarios can easily lead to fluctuations in retrieval quality. Traditional manual evaluation is costly and outdated, making it difficult to meet real-time optimization requirements. At the same time, existing models have limitations in complex feature fusion and parameter optimization. Therefore, this article proposes a retrieval quality prediction model that combines the Lizard Optimization Algorithm (HLOA), Convolutional Neural Network (CNN), and Bidirectional Gated Recurrent Unit (BIGRU). Correlation analysis shows that there is a strong positive correlation between retrieval rank and retrieval usefulness score, meaning that the higher the retrieval rank, the better the retrieval usefulness score; The query complexity is strongly negatively correlated with the retrieval usefulness score, meaning that the higher the query complexity, the lower the retrieval usefulness score. Integrate this model with decision trees, random forests Adaboost, The comparison of nine models, including gradient boosting tree, ExtraTrees, CatBoost, XGBoost, LightGBM, and KNN, showed that their performance was overall better: MSE (28.617), RMSE (5.349), MAE (4.401), and MAPE (17.355) were the lowest, while R ² (0.952) was the highest. This study provides an effective solution for accurate prediction and real-time optimization of the retrieval quality of RAG systems, helping to enhance the application value of RAG technology in practical scenarios.

Keywords:

RAG, Horn lizard optimization algorithm, integrated neural network, bidirectional gated recurrent unit, retrieval quality.

View PDF

References

[1]. Huly, Oz, David Carmel, and Oren Kurland. "Predicting RAG Performance for Text Completion." Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2025.

[2]. Zhao, Shengming, et al. "Towards understanding retrieval accuracy and prompt quality in rag systems." arXiv preprint arXiv: 2411.19463 (2024).

[3]. Veturi, Sriram, et al. "Rag based question-answering for contextual response prediction system." arXiv preprint arXiv: 2409.03708 (2024).

[4]. Chan, Chi-Min, et al. "Rq-rag: Learning to refine queries for retrieval augmented generation." arXiv preprint arXiv: 2404.00610 (2024).

[5]. Shi, Yunxiao, et al. "Enhancing retrieval and managing retrieval: A four-module synergy for improved quality and efficiency in rag systems." arXiv preprint arXiv: 2407.10670 (2024).

[6]. He, Jacky, et al. "Context-Guided Dynamic Retrieval for Improving Generation Quality in RAG Models." arXiv preprint arXiv: 2504.19436 (2025).

[7]. Jiang, Ziyan, Xueguang Ma, and Wenhu Chen. "Longrag: Enhancing retrieval-augmented generation with long-context llms." arXiv preprint arXiv: 2406.15319 (2024).

[8]. Ampazis, Nicholas. "Improving RAG quality for large language models with topic-enhanced reranking." IFIP international conference on artificial intelligence applications and innovations. Cham: Springer Nature Switzerland, 2024.

[9]. Yang, Xiao, et al. "Crag-comprehensive rag benchmark." Advances in Neural Information Processing Systems 37 (2024): 10470-10490.

[10]. Zhang, Zihan, Meng Fang, and Ling Chen. "Retrievalqa: Assessing adaptive retrieval-augmented generation for short-form open-domain question answering." arXiv preprint arXiv: 2402.16457 (2024).