Comparative Analysis of ETC, UCB, and Thompson Sampling for Personalized Video Recommendations on Short-Video Platform

Shuqiao Chen

doi:10.54254/2753-8818/2026.CH30041

Theoretical and Natural ScienceOpen access

Comparative Analysis of ETC, UCB, and Thompson Sampling for Personalized Video Recommendations on Short-Video Platform

Research Article

Open Access

Comparative Analysis of ETC, UCB, and Thompson Sampling for Personalized Video Recommendations on Short-Video Platform

Shuqiao Chen ^1*

¹ University of Manchester

^*Corresponding author: shuqiao.chen@student.manchester.ac.uk

Published on 26 November 2025

TNS Vol.151

ISSN (Print): 2753-8826

ISSN (Online): 2753-8818

ISBN (Print): 978-1-80590-559-2

ISBN (Online): 978-1-80590-560-8

Download Cover

Abstract

This study empirically compares three canonical Multi-Armed Bandit (MAB) algorithms—Explore-Then-Commit (ETC), fixed initial exploration, Upper Confidence Bound (UCB1), which is the optimism-driven uncertainty estimation, and Thompson Sampling (TS) with Bernoulli likelihood (TS-Bernoulli, posterior-sampling-based)—for short-video recommendation, aiming to solve the exploration-exploitation tradeoff in real-time feed systems. Experiments were conducted on the ShortVideo-Interactions (SVI-200K) dataset, a simulated corpus with ~1.2 million timestamped impressions and clicks from 240,000 user sessions over 30 days, covering ~18,000 unique items to mimic real platform dynamics. Evaluations used a fixed horizon (T=2000 timesteps) and restricted candidates to the top 200 items (K=200) per run, spanning three practical scenarios: stable base, information-scarce cold-start (new items with no prior data), and preference-drifting temporal-shift. Results, aggregated over three pseudo-random seeds (2025, 2026, 2027), show TS-Bernoulli consistently outperforms peers: it achieves the highest Click-Through Rate (CTR) (0.452 in base, 0.402 in cold-start, 0.428 in temporal-shift) and lowest cumulative regret (418, 518, 467 respectively). These findings confirm that TS-Bernoulli’s posterior sampling enables robust adaptation to short-video recommendation’s key challenges (information scarcity and non-stationarity), providing a practical algorithm choice for real-world platforms.

Keywords:

Multi-Armed Bandits, Thompson Sampling, Short-Video Recommendation, Cold-Start, Cumulative Regret

View PDF

References

[1]. Zhang, L.M., Dong, J.F., Bao, C.Z., et al.: Click-through Rate Prediction for Video Cold-start Problem. Journal of Software 33(12), 4838–4850 (2022).

[2]. Xie, M., Li, M.X., Wang, X.: Practice of Industrial-Grade Bandit Algorithm Product in Short-Video Cold-Start. Journal of Software 34(8), 3120–3135 (2023).

[3]. Wang, Y., Li, H., Zhang, C.: Adaptive Thompson sampling with dynamic priors for short-video recommendation. IEEE Transactions on Knowledge and Data Engineering 35(8), 7890–7903 (2023).

[4]. Chen, W., Zhu, L., Yin, H.: Reproducible evaluation framework for online bandit algorithms. Acta Automatica Sinica 48(9), 2015–2028 (2022).

[5]. Zhang, L., Wang, H., Chen, J.: Pre-trained embeddings for contextual bandit cold-start in short videos. Pattern Recognition and Artificial Intelligence 37(2), 132–145 (2024).

[6]. Li, X., Zhou, T.: Contextual bandit recommendation with dynamic feature weighting. IEEE Transactions on Neural Networks and Learning Systems 34(9), 5678–5690 (2023).

[7]. Agrawal, S., Goyal, N.: Near-optimal regret bounds for Thompson sampling in non-stationary bandits. Journal of Machine Learning Research 22(1), 11265–11311 (2021).

[8]. Abbasi-Yadkori, Y., Szepesvári, C.: Regret bounds for non-stationary bandit problems. Operations Research Transactions 25(3), 451–472 (2021).

[9]. Chen, L., Wang, H., Li, S.: Fairness-aware Thompson sampling for creator equity in short-video platforms. Journal of Computer Research and Development 60(7), 1568–1582 (2023).

[10]. Jaffe, S., Zhang, C.: A survey of bandit algorithms for real-time advertising. Computer Science 49(S1), 1–18 (2022).

[11]. Han, J., Kim, S.: Change-point detection for non-stationary bandit recommendation systems. Control and Decision 37(8), 1989–1996 (2022).

[12]. Luo, Y., Wang, F.: Bandit-based real-time recommendation for short-video platforms. Journal of Data Acquisition and Processing 38(4), 892–905 (2023).