References
[1]. CAI Rui, GE Jun, and SUN Zhe. Overview of the Development of AI Pre-trained Large Models [J/OL]. Journal of Chinese Mini-Micro Computer Systems, 2024: 1-12.
[2]. Yufeng Wang. Unlocking a New Chapter in AI Large-Scale Model Applications: Technological Evolution, Challenges, and Future Prospects, 2024: 18-19
[3]. Deng, Z., Ma, W., Han, Q. L., Zhou, W., Zhu, X., Wen, S., & Xiang, Y. Exploring DeepSeek: A Survey on Advances, Applications, Challenges and Future Directions. IEEE/CAA Journal of Automatica Sinica, 12(5), 2025: 872-893.
[4]. Turner, R. E. An introduction to transformers. arXiv preprint arXiv: 2304.10557. 2023
[5]. Martins, A., Farinhas, A., Treviso, M., Niculae, V., Aguiar, P., & Figueiredo, M. Sparse and continuous attention mechanisms. Advances in Neural Information Processing Systems, 33, 2020: 20989-21001.
[6]. Masoudnia, S., & Ebrahimpour, R. Mixture of experts: a literature survey. Artificial Intelligence Review, 42, 2014: 275-293.
[7]. Wang, C., & Kantarcioglu, M. A review of DeepSeek models' key innovative techniques. arXiv preprint arXiv: 2503.11486. 2025
[8]. Gu, Z., Zhang, H., Chen, R., Hu, Y., & Zhang, H. Unpacking Positional Encoding in Transformers: A Spectral Analysis of Content-Position Coupling. arXiv preprint arXiv: 2505.13027. 2025
[9]. Shi Zhenyu, Yu Haiyan, Zhang Kun, Liu Fangqi, Shen Dinglai, & Li Changbing. (2025).New Path for the Development of Management Science and Engineering Disciplines Integrating DeepSeek Large Models. Management Science and Engineering, 14, 640.