Contextual Multi-Armed Bandits for Dynamic News Recommendation: An Empirical Evaluation
Research Article
Open Access
CC BY

Contextual Multi-Armed Bandits for Dynamic News Recommendation: An Empirical Evaluation

Jiashuo Wang 1*
1 Shanxi Agricultural University
*Corresponding author: 20241210823@stu.sxau.edu.cn
Published on 26 November 2025
Volume Cover
TNS Vol.151
ISSN (Print): 2753-8826
ISSN (Online): 2753-8818
ISBN (Print): 978-1-80590-559-2
ISBN (Online): 978-1-80590-560-8
Download Cover

Abstract

With the advent of the information explosion era, personalized news recommendation faces critical challenges including cold start problems, real-time changes in user preferences, and information filter bubbles. Traditional collaborative filtering methods rely heavily on historical data and struggle to adapt to the rapid update characteristics of news content. This paper proposes a news recommendation solution based on Multi-Armed Bandit (MAB) algorithms, addressing these challenges by balancing exploration and exploitation. The study implements four core algorithms: ε-greedy algorithm balances exploration and exploitation through probability mechanisms; Upper Confidence Bound (UCB) algorithm employs optimistic estimation using confidence upper bounds; Thompson sampling adopts probability adaptation based on Bayesian framework; and Contextual Linear Bandit (LinUCB) integrates user and news features for personalized recommendations. Experiments Youdaoplaceholder0 on the MIND large-scale news dataset (containing 160,000 news articles, 1 million users) and 15 million click interactions) demonstrate that contextual bandit algorithms outperform traditional methods in click-through rate, dwell time, and recommendation diversity. Thompson sampling shows outstanding performance in click-through rates, while LinUCB excels in convergence speed and recommendation diversity. The experiments confirm that MAB algorithms can effectively adapt to dynamic changes in user preferences, providing a viable solution for real-time news recommendation systems.

Keywords:

News Recommendation, Multi-Armed Bandit, Contextual Bandit, Exploration Utilization

View PDF
Wang,J. (2025). Contextual Multi-Armed Bandits for Dynamic News Recommendation: An Empirical Evaluation. Theoretical and Natural Science,151,21-30.

References

[1]. Wu, C., Wu, F., Huang, Y., Xie, X.: Personalized news recommendation: Methods and challenges. ACM Transactions on Information Systems 41(1), 1–50 (2023).

[2]. Zhang, Y., Chen, X.: Explainable recommendation: A survey and new perspectives. Foundations and Trends in Information Retrieval 14(1), 1–101 (2020).

[3]. Yang, Y.: Modeling user autonomy in recommender systems using Markov perturbation-based multi-armed bandits. Theoretical and Natural Science 86(1), 195–201 (2025).

[4]. Lattimore, T., Szepesvari, C.: Bandit Algorithms. Cambridge University Press (2020).

[5]. Wu, C.Y., Wu, F., Qi, T., Lian, J., Huang, Y., Xie, X.: MIND: A large-scale dataset for news recommendation. In: Proceedings of ACL, pp. 3597–3606 (2020).

[6]. Qi, T., Wu, F., Wu, C., Huang, Y., Xie, X.: Personalized news recommendation with knowledge-aware user interest modeling. ACM Transactions on Information Systems 39(4), 1–28 (2021).

[7]. Wu, C.Y., Wu, F., Ge, S., Qi, T., Huang, Y., Xie, X.: Neural news recommendation with multi-head self-attention. In: Proceedings of EMNLP, pp. 6389–6394 (2019).

[8]. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019).

[9]. Okura, S., Tagami, Y., Ono, S., Tajima, A.: Embedding-based news recommendation for millions of users. In: Proceedings of KDD, pp. 360–369 (2017).

[10]. Yoshida, N., Nishio, T., Morikura, M., Yamamoto, K.: MAB-based client selection for federated learning with uncertain resources in mobile networks. IEEE Transactions on Mobile Computing 19(11), 2562–2576 (2020).

[11]. Feng, F., Chen, X., He, X., Ding, Z., Zhang, Y.: Improving personalized recommendation with complementary item relationship modeling. IEEE Transactions on Knowledge and Data Engineering 33(5), 2210–2223 (2021).

[12]. Zhu, Y., Wang, X., He, X., Xu, T.: Deep reinforcement learning for online advertising in recommender systems. ACM Transactions on Intelligent Systems and Technology 12(6), 1–25 (2021).

Cite this article

Wang,J. (2025). Contextual Multi-Armed Bandits for Dynamic News Recommendation: An Empirical Evaluation. Theoretical and Natural Science,151,21-30.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-CIAP 2026 Symposium: Applied Mathematics and Statistics

ISBN: 978-1-80590-559-2(Print) / 978-1-80590-560-8(Online)
Editor: Marwan Omar
Conference date: 27 January 2026
Series: Theoretical and Natural Science
Volume number: Vol.151
ISSN: 2753-8818(Print) / 2753-8826(Online)