A Survey on Pre-trained Language Models Based on Deep Learning: Technological Development and Applications
Research Article
Open Access
CC BY

A Survey on Pre-trained Language Models Based on Deep Learning: Technological Development and Applications

Yuansheng Lin 1*
1 Capital University of Business, Beijing, China, 102218
*Corresponding author: 569372381@qq.com
Published on 10 July 2025
Volume Cover
ACE Vol.174
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-80590-235-5
ISBN (Online): 978-1-80590-236-2
Download Cover

Abstract

With the advent of the big data era and the enhancement of computing capabilities, deep learning technologies have achieved remarkable breakthroughs in the field of natural language processing (NLP). Pre-trained large language models, such as GPT and BERT, have significantly improved the performance of various NLP tasks, including text generation, question-answering systems, sentiment analysis, and machine translation, through pre-training on large-scale unsupervised data. This paper reviews the latest developments of pre-trained large language models based on deep learning, with a particular focus on the pre-training methods of BERT and GPT. Through a literature review and comparative analysis of models, this paper provides a detailed exploration of the core technologies of pre-trained models. The study finds that the Transformer architecture is the core of pre-trained models, significantly enhancing the performance of language models. However, the expansion of model size also brings increased computational costs and issues of interpretability. Future research directions include efficient pre-training methods, model compression and distillation, multimodal integration, as well as ethical and sustainability issues.

Keywords:

Deep Learning, Large Language Models, Natural Language Processing, GPT, BERT

View PDF
Lin,Y. (2025). A Survey on Pre-trained Language Models Based on Deep Learning: Technological Development and Applications. Applied and Computational Engineering,174,170-178.

References

[1]. Radford A., Narasimhan K., Salimsans T., et al. Improving language understanding by generative pre-training [J/OL]. https: //www.mikecaptain.com/resources/pdf/GPT-1.pdf

[2]. Devlin J., Chang M. W., Lee K., et al. (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. Computation and Language, 24 May. arXiv: 1810.04805.

[3]. Boyu Z., Azadeh D., Yuhen Hu. (2021) A Mixture of Experts Approach for Low-Cost DNN Customization. IEEE Design & Test, Aug 2021, vol. 38, no. 4, pp. 52-59.

[4]. Palangi H. I., Deng L., Shen Y., et al. (2016) Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval [J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 24(4): 694-707.

[5]. Mikolov T., Sutskever I., Chien K., et al. (2013) Distributed representations of words and phrases and their compositionality [C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc, 2: 3111-3119.

[6]. Jian Zhang, Dan Qu, Zhen Li. (2015) Recurrent Neural Network Language Model Based on Word Vector Features [J]. PR&AI, 28(4): 299-305.

[7]. Werbos P. J. (1990) Backpropagation through time: what it does and how to do it [J]. Proceedings of the IEEE, 78(10): 1550-1560.

[8]. Li Yang, Yu-xi Wu, Junli Wang, Yili Liu. (2018) Research on recurrent neural network. Journal of Computer Applications, 38(S2): 1-6, 26.

[9]. Kunfeng Wang, Chao Gou, Yanjie Duan, Yilun Lin, Xinhu Zheng, Feiyue Wang. (2017) Generative Adversarial Networks: The State of the Art and Beyond. ACTA AUTOMATICA SINICA, 43(03): 321-332.

[10]. Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998-6008.

[11]. Li, Z. J., Fan, Y., & Wu, X. J. (2020) Survey of Natural Language Processing Pre-trainning Techniques. Computer Science, No.03: 162-173.

[12]. Radford, A., et al. (2018). Improving Language Understanding by Generative Pre-training. OpenAI.

[13]. Devlin J., Chang M. W., Lee K., et al. (2018) BERT: Pre-training of deep bidirectional transformers for language understanding [J]. arXiv preprint arXiv: 1810.04805.

[14]. Youqing. (2025) China's Big Model Competition: A Comprehensive Analysis of api Prices, Basic Parameters, and Core Performance. https: //www.explinks.com/blog/pr-domestic-large-scale-model-competition/

[15]. Mark Ren. (2025) ChatGPT-O3 vs. Grok-3 vs. DeepSeek-R1: A Comparison of the Three Major AI Large Models - Technical Architecture, Reasoning Ability and Application. https: //www.zedyer.com/iot-knowledge/chatgpt-o3-vs-grok-3-vs-deepseek-r1/

[16]. Hao You-Cai-Hua. (2025) Doubao-1.5-pro: ByteDance's latest Doubao large model, with performance surpassing GPT-4o and Claude 3.5 Sonnet. https: //zhuanlan.zhihu.com/p/19893505477

[17]. Yu G. T. C. D. D. Z. J. (2025) Compare the latest information of DeepSeek, Grok-3, ChatGPT O3 Mini High and O1 Pro in terms of technical architecture, training data, computing power, generation ability, multimodal support and applicable scenarios. https: //blog.csdn.net/h050210/article/details/145975341

[18]. ByteDance. (2025) Doubao-1.5-pro. https: //seed.bytedance.com/zh/special/doubao_1_5_pro/

Cite this article

Lin,Y. (2025). A Survey on Pre-trained Language Models Based on Deep Learning: Technological Development and Applications. Applied and Computational Engineering,174,170-178.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-CDS 2025 Symposium: Data Visualization Methods for Evaluation

ISBN: 978-1-80590-235-5(Print) / 978-1-80590-236-2(Online)
Editor: Marwan Omar, Elisavet Andrikopoulou
Conference date: 30 July 2025
Series: Applied and Computational Engineering
Volume number: Vol.174
ISSN: 2755-2721(Print) / 2755-273X(Online)