References
[1]. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2), 235–256 (2002).
[2]. Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 661–670 (2010).
[3]. Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: International Conference on Machine Learning, pp. 127–135. PMLR (2013).
[4]. Wei, L., Srivastava, V.: Nonstationary stochastic multiarmed bandits: UCB policies and minimax regret. arXiv preprint arXiv: 2101.08980 (2021).
[5]. Zhu, J., Liu, J.: Distributed multi-armed bandit over arbitrary undirected graphs. In: 2021 60th IEEE Conference on Decision and Control (CDC), pp. 6976–6981. IEEE (2021).
[6]. Qiu, S., Wang, L., Bai, C., Yang, Z., Wang, Z.: Contrastive ucb: Provably efficient contrastive self-supervised learning in online reinforcement learning. In: International Conference on Machine Learning, pp. 18168–18210. PMLR (2022).
[7]. Zhu, R.J., Qiu, Y.: UCB Exploration for Fixed-Budget Bayesian Best Arm Identification. arXiv preprint arXiv: 2408.04869 (2024).
[8]. Elumar, E.C., Tekin, C., Yağan, O.: Multi-armed bandits with costly probes. IEEE Transactions on Information Theory (2024).
[9]. Wu, H., Xu, Y., Cao, S., Liu, J., Takakura, H., Norio, S.: Sleeping Multi-Armed Bandit-Based Path Selection in Space-Ground Semantic Communication Networks. In: 2025 IEEE Wireless Communications and Networking Conference (WCNC), pp. 1–6. IEEE (2025).
[10]. Saday, A., Demirel, İ., Yıldırım, Y., Tekin, C.: Federated multi-armed bandits under byzantine attacks. IEEE Transactions on Artificial Intelligence (2025).