Quality Prediction of RAG System Retrieval Based on Machine Learning Algorithms
Research Article
Open Access
CC BY

Quality Prediction of RAG System Retrieval Based on Machine Learning Algorithms

Chaoyi Yu 1*
1 School of Computer Science and Engineering, University of Electronic Science and Technology of China, 611731, China
*Corresponding author: 2387663175@qq.com
Published on 14 October 2025
Journal Cover
ACE Vol.193
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-80590-239-3
ISBN (Online): 978-1-80590-240-9
Download Cover

Abstract

The Retrieval Enhanced Generation (RAG) system improves the accuracy and reliability of content generation by retrieving external knowledge, and has been widely used in fields such as intelligent question answering and knowledge assistants. However, its core performance depends on the quality of the retrieval stage, and the relevance and factual consistency of the retrieval results directly determine the effectiveness of the generated content. However, factors such as query complexity, document noise, and domain differences in real-world scenarios can easily lead to fluctuations in retrieval quality. Traditional manual evaluation is costly and outdated, making it difficult to meet real-time optimization requirements. At the same time, existing models have limitations in complex feature fusion and parameter optimization. Therefore, this article proposes a retrieval quality prediction model that combines the Lizard Optimization Algorithm (HLOA), Convolutional Neural Network (CNN), and Bidirectional Gated Recurrent Unit (BIGRU). Correlation analysis shows that there is a strong positive correlation between retrieval rank and retrieval usefulness score, meaning that the higher the retrieval rank, the better the retrieval usefulness score; The query complexity is strongly negatively correlated with the retrieval usefulness score, meaning that the higher the query complexity, the lower the retrieval usefulness score. Integrate this model with decision trees, random forests Adaboost, The comparison of nine models, including gradient boosting tree, ExtraTrees, CatBoost, XGBoost, LightGBM, and KNN, showed that their performance was overall better: MSE (28.617), RMSE (5.349), MAE (4.401), and MAPE (17.355) were the lowest, while R ² (0.952) was the highest. This study provides an effective solution for accurate prediction and real-time optimization of the retrieval quality of RAG systems, helping to enhance the application value of RAG technology in practical scenarios.

Keywords:

RAG, Horn lizard optimization algorithm, integrated neural network, bidirectional gated recurrent unit, retrieval quality.

View PDF
Yu,C. (2025). Quality Prediction of RAG System Retrieval Based on Machine Learning Algorithms. Applied and Computational Engineering,193,8-16.

References

[1]. Huly, Oz, David Carmel, and Oren Kurland. "Predicting RAG Performance for Text Completion." Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2025.

[2]. Zhao, Shengming, et al. "Towards understanding retrieval accuracy and prompt quality in rag systems."  arXiv preprint arXiv: 2411.19463  (2024).

[3]. Veturi, Sriram, et al. "Rag based question-answering for contextual response prediction system."  arXiv preprint arXiv: 2409.03708  (2024).

[4]. Chan, Chi-Min, et al. "Rq-rag: Learning to refine queries for retrieval augmented generation."  arXiv preprint arXiv: 2404.00610  (2024).

[5]. Shi, Yunxiao, et al. "Enhancing retrieval and managing retrieval: A four-module synergy for improved quality and efficiency in rag systems."  arXiv preprint arXiv: 2407.10670  (2024).

[6]. He, Jacky, et al. "Context-Guided Dynamic Retrieval for Improving Generation Quality in RAG Models."  arXiv preprint arXiv: 2504.19436  (2025).

[7]. Jiang, Ziyan, Xueguang Ma, and Wenhu Chen. "Longrag: Enhancing retrieval-augmented generation with long-context llms."  arXiv preprint arXiv: 2406.15319  (2024).

[8]. Ampazis, Nicholas. "Improving RAG quality for large language models with topic-enhanced reranking." IFIP international conference on artificial intelligence applications and innovations. Cham: Springer Nature Switzerland, 2024.

[9]. Yang, Xiao, et al. "Crag-comprehensive rag benchmark." Advances in Neural Information Processing Systems 37 (2024): 10470-10490.

[10]. Zhang, Zihan, Meng Fang, and Ling Chen. "Retrievalqa: Assessing adaptive retrieval-augmented generation for short-form open-domain question answering."  arXiv preprint arXiv: 2402.16457  (2024).

Cite this article

Yu,C. (2025). Quality Prediction of RAG System Retrieval Based on Machine Learning Algorithms. Applied and Computational Engineering,193,8-16.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of the 3rd International Conference on Machine Learning and Automation

ISBN: 978-1-80590-239-3(Print) / 978-1-80590-240-9(Online)
Editor: Hisham AbouGrad
Conference website: https://www.confmla.org/
Conference date: 17 November 2025
Series: Applied and Computational Engineering
Volume number: Vol.193
ISSN: 2755-2721(Print) / 2755-273X(Online)