A Survey on Pre-trained Language Models Based on Deep Learning: Technological Development and Applications

Yuansheng Lin

doi:10.54254/2755-2721/2025.PO24899

Applied and Computational EngineeringOpen access

A Survey on Pre-trained Language Models Based on Deep Learning: Technological Development and Applications

Research Article

Open Access

A Survey on Pre-trained Language Models Based on Deep Learning: Technological Development and Applications

Yuansheng Lin ^1*

¹ Capital University of Business, Beijing, China, 102218

^*Corresponding author: 569372381@qq.com

Published on 10 July 2025

ACE Vol.174

ISSN (Print): 2755-273X

ISSN (Online): 2755-2721

ISBN (Print): 978-1-80590-235-5

ISBN (Online): 978-1-80590-236-2

Download Cover

Abstract

With the advent of the big data era and the enhancement of computing capabilities, deep learning technologies have achieved remarkable breakthroughs in the field of natural language processing (NLP). Pre-trained large language models, such as GPT and BERT, have significantly improved the performance of various NLP tasks, including text generation, question-answering systems, sentiment analysis, and machine translation, through pre-training on large-scale unsupervised data. This paper reviews the latest developments of pre-trained large language models based on deep learning, with a particular focus on the pre-training methods of BERT and GPT. Through a literature review and comparative analysis of models, this paper provides a detailed exploration of the core technologies of pre-trained models. The study finds that the Transformer architecture is the core of pre-trained models, significantly enhancing the performance of language models. However, the expansion of model size also brings increased computational costs and issues of interpretability. Future research directions include efficient pre-training methods, model compression and distillation, multimodal integration, as well as ethical and sustainability issues.

Keywords:

Deep Learning, Large Language Models, Natural Language Processing, GPT, BERT

View PDF

References

[1]. Radford A., Narasimhan K., Salimsans T., et al. Improving language understanding by generative pre-training [J/OL]. https: //www.mikecaptain.com/resources/pdf/GPT-1.pdf

[2]. Devlin J., Chang M. W., Lee K., et al. (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. Computation and Language, 24 May. arXiv: 1810.04805.

[3]. Boyu Z., Azadeh D., Yuhen Hu. (2021) A Mixture of Experts Approach for Low-Cost DNN Customization. IEEE Design & Test, Aug 2021, vol. 38, no. 4, pp. 52-59.

[4]. Palangi H. I., Deng L., Shen Y., et al. (2016) Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval [J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 24(4): 694-707.

[5]. Mikolov T., Sutskever I., Chien K., et al. (2013) Distributed representations of words and phrases and their compositionality [C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc, 2: 3111-3119.

[6]. Jian Zhang, Dan Qu, Zhen Li. (2015) Recurrent Neural Network Language Model Based on Word Vector Features [J]. PR＆AI, 28(4): 299-305.

[7]. Werbos P. J. (1990) Backpropagation through time: what it does and how to do it [J]. Proceedings of the IEEE, 78(10): 1550-1560.

[8]. Li Yang, Yu-xi Wu, Junli Wang, Yili Liu. (2018) Research on recurrent neural network. Journal of Computer Applications, 38(S2): 1-6, 26.

[9]. Kunfeng Wang, Chao Gou, Yanjie Duan, Yilun Lin, Xinhu Zheng, Feiyue Wang. (2017) Generative Adversarial Networks: The State of the Art and Beyond. ACTA AUTOMATICA SINICA, 43(03): 321-332.

[10]. Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998-6008.

[11]. Li, Z. J., Fan, Y., & Wu, X. J. (2020) Survey of Natural Language Processing Pre-trainning Techniques. Computer Science, No.03: 162-173.

[12]. Radford, A., et al. (2018). Improving Language Understanding by Generative Pre-training. OpenAI.

[13]. Devlin J., Chang M. W., Lee K., et al. (2018) BERT: Pre-training of deep bidirectional transformers for language understanding [J]. arXiv preprint arXiv: 1810.04805.

[14]. Youqing. (2025) China's Big Model Competition: A Comprehensive Analysis of api Prices, Basic Parameters, and Core Performance. https: //www.explinks.com/blog/pr-domestic-large-scale-model-competition/

[15]. Mark Ren. (2025) ChatGPT-O3 vs. Grok-3 vs. DeepSeek-R1: A Comparison of the Three Major AI Large Models - Technical Architecture, Reasoning Ability and Application. https: //www.zedyer.com/iot-knowledge/chatgpt-o3-vs-grok-3-vs-deepseek-r1/

[16]. Hao You-Cai-Hua. (2025) Doubao-1.5-pro: ByteDance's latest Doubao large model, with performance surpassing GPT-4o and Claude 3.5 Sonnet. https: //zhuanlan.zhihu.com/p/19893505477

[17]. Yu G. T. C. D. D. Z. J. (2025) Compare the latest information of DeepSeek, Grok-3, ChatGPT O3 Mini High and O1 Pro in terms of technical architecture, training data, computing power, generation ability, multimodal support and applicable scenarios. https: //blog.csdn.net/h050210/article/details/145975341

[18]. ByteDance. (2025) Doubao-1.5-pro. https: //seed.bytedance.com/zh/special/doubao_1_5_pro/