Contextual Multi-Armed Bandits for Dynamic News Recommendation: An Empirical Evaluation

Jiashuo Wang

doi:10.54254/2753-8818/2026.CH29998

Theoretical and Natural ScienceOpen access

Contextual Multi-Armed Bandits for Dynamic News Recommendation: An Empirical Evaluation

Research Article

Open Access

Contextual Multi-Armed Bandits for Dynamic News Recommendation: An Empirical Evaluation

Jiashuo Wang ^1*

¹ Shanxi Agricultural University

^*Corresponding author: 20241210823@stu.sxau.edu.cn

Published on 26 November 2025

TNS Vol.151

ISSN (Print): 2753-8826

ISSN (Online): 2753-8818

ISBN (Print): 978-1-80590-559-2

ISBN (Online): 978-1-80590-560-8

Download Cover

Abstract

With the advent of the information explosion era, personalized news recommendation faces critical challenges including cold start problems, real-time changes in user preferences, and information filter bubbles. Traditional collaborative filtering methods rely heavily on historical data and struggle to adapt to the rapid update characteristics of news content. This paper proposes a news recommendation solution based on Multi-Armed Bandit (MAB) algorithms, addressing these challenges by balancing exploration and exploitation. The study implements four core algorithms: ε-greedy algorithm balances exploration and exploitation through probability mechanisms; Upper Confidence Bound (UCB) algorithm employs optimistic estimation using confidence upper bounds; Thompson sampling adopts probability adaptation based on Bayesian framework; and Contextual Linear Bandit (LinUCB) integrates user and news features for personalized recommendations. Experiments Youdaoplaceholder0 on the MIND large-scale news dataset (containing 160,000 news articles, 1 million users) and 15 million click interactions) demonstrate that contextual bandit algorithms outperform traditional methods in click-through rate, dwell time, and recommendation diversity. Thompson sampling shows outstanding performance in click-through rates, while LinUCB excels in convergence speed and recommendation diversity. The experiments confirm that MAB algorithms can effectively adapt to dynamic changes in user preferences, providing a viable solution for real-time news recommendation systems.

Keywords:

News Recommendation, Multi-Armed Bandit, Contextual Bandit, Exploration Utilization

View PDF

References

[1]. Wu, C., Wu, F., Huang, Y., Xie, X.: Personalized news recommendation: Methods and challenges. ACM Transactions on Information Systems 41(1), 1–50 (2023).

[2]. Zhang, Y., Chen, X.: Explainable recommendation: A survey and new perspectives. Foundations and Trends in Information Retrieval 14(1), 1–101 (2020).

[3]. Yang, Y.: Modeling user autonomy in recommender systems using Markov perturbation-based multi-armed bandits. Theoretical and Natural Science 86(1), 195–201 (2025).

[4]. Lattimore, T., Szepesvari, C.: Bandit Algorithms. Cambridge University Press (2020).

[5]. Wu, C.Y., Wu, F., Qi, T., Lian, J., Huang, Y., Xie, X.: MIND: A large-scale dataset for news recommendation. In: Proceedings of ACL, pp. 3597–3606 (2020).

[6]. Qi, T., Wu, F., Wu, C., Huang, Y., Xie, X.: Personalized news recommendation with knowledge-aware user interest modeling. ACM Transactions on Information Systems 39(4), 1–28 (2021).

[7]. Wu, C.Y., Wu, F., Ge, S., Qi, T., Huang, Y., Xie, X.: Neural news recommendation with multi-head self-attention. In: Proceedings of EMNLP, pp. 6389–6394 (2019).

[8]. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019).

[9]. Okura, S., Tagami, Y., Ono, S., Tajima, A.: Embedding-based news recommendation for millions of users. In: Proceedings of KDD, pp. 360–369 (2017).

[10]. Yoshida, N., Nishio, T., Morikura, M., Yamamoto, K.: MAB-based client selection for federated learning with uncertain resources in mobile networks. IEEE Transactions on Mobile Computing 19(11), 2562–2576 (2020).

[11]. Feng, F., Chen, X., He, X., Ding, Z., Zhang, Y.: Improving personalized recommendation with complementary item relationship modeling. IEEE Transactions on Knowledge and Data Engineering 33(5), 2210–2223 (2021).

[12]. Zhu, Y., Wang, X., He, X., Xu, T.: Deep reinforcement learning for online advertising in recommender systems. ACM Transactions on Intelligent Systems and Technology 12(6), 1–25 (2021).