References
[1]. Radford A., Narasimhan K., Salimsans T., et al. Improving language understanding by generative pre-training [J/OL]. https: //www.mikecaptain.com/resources/pdf/GPT-1.pdf
[2]. Devlin J., Chang M. W., Lee K., et al. (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. Computation and Language, 24 May. arXiv: 1810.04805.
[3]. Boyu Z., Azadeh D., Yuhen Hu. (2021) A Mixture of Experts Approach for Low-Cost DNN Customization. IEEE Design & Test, Aug 2021, vol. 38, no. 4, pp. 52-59.
[4]. Palangi H. I., Deng L., Shen Y., et al. (2016) Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval [J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 24(4): 694-707.
[5]. Mikolov T., Sutskever I., Chien K., et al. (2013) Distributed representations of words and phrases and their compositionality [C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc, 2: 3111-3119.
[6]. Jian Zhang, Dan Qu, Zhen Li. (2015) Recurrent Neural Network Language Model Based on Word Vector Features [J]. PR&AI, 28(4): 299-305.
[7]. Werbos P. J. (1990) Backpropagation through time: what it does and how to do it [J]. Proceedings of the IEEE, 78(10): 1550-1560.
[8]. Li Yang, Yu-xi Wu, Junli Wang, Yili Liu. (2018) Research on recurrent neural network. Journal of Computer Applications, 38(S2): 1-6, 26.
[9]. Kunfeng Wang, Chao Gou, Yanjie Duan, Yilun Lin, Xinhu Zheng, Feiyue Wang. (2017) Generative Adversarial Networks: The State of the Art and Beyond. ACTA AUTOMATICA SINICA, 43(03): 321-332.
[10]. Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998-6008.
[11]. Li, Z. J., Fan, Y., & Wu, X. J. (2020) Survey of Natural Language Processing Pre-trainning Techniques. Computer Science, No.03: 162-173.
[12]. Radford, A., et al. (2018). Improving Language Understanding by Generative Pre-training. OpenAI.
[13]. Devlin J., Chang M. W., Lee K., et al. (2018) BERT: Pre-training of deep bidirectional transformers for language understanding [J]. arXiv preprint arXiv: 1810.04805.
[14]. Youqing. (2025) China's Big Model Competition: A Comprehensive Analysis of api Prices, Basic Parameters, and Core Performance. https: //www.explinks.com/blog/pr-domestic-large-scale-model-competition/
[15]. Mark Ren. (2025) ChatGPT-O3 vs. Grok-3 vs. DeepSeek-R1: A Comparison of the Three Major AI Large Models - Technical Architecture, Reasoning Ability and Application. https: //www.zedyer.com/iot-knowledge/chatgpt-o3-vs-grok-3-vs-deepseek-r1/
[16]. Hao You-Cai-Hua. (2025) Doubao-1.5-pro: ByteDance's latest Doubao large model, with performance surpassing GPT-4o and Claude 3.5 Sonnet. https: //zhuanlan.zhihu.com/p/19893505477
[17]. Yu G. T. C. D. D. Z. J. (2025) Compare the latest information of DeepSeek, Grok-3, ChatGPT O3 Mini High and O1 Pro in terms of technical architecture, training data, computing power, generation ability, multimodal support and applicable scenarios. https: //blog.csdn.net/h050210/article/details/145975341
[18]. ByteDance. (2025) Doubao-1.5-pro. https: //seed.bytedance.com/zh/special/doubao_1_5_pro/