Comparison Between Transformer-based Protein Language Models and Traditional Text-based Language Models from a Computer Science Perspective
Research Article
Open Access
CC BY

Comparison Between Transformer-based Protein Language Models and Traditional Text-based Language Models from a Computer Science Perspective

Chenglin Xu 1*
1 Beijing Jiaotong University
*Corresponding author: 23722029@bjtu.edu.cn
Published on 19 November 2025
Volume Cover
ACE Vol.207
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-80590-539-4
ISBN (Online): 978-1-80590-540-0
Download Cover

Abstract

The Transformer architecture has transformed natural language processing (NLP) by enabling efficient sequence modeling through self-attention and embedding techniques. However, its ability to adapt to domain-specific data, such as protein sequences, introduces both unique computational challenges and opportunities. As the sequence space increases, the understanding of architectural differences is crucial for improving model efficiency and generalization. This study aims to investigate the fundamental differences between protein language models (PLMs) and traditional text-based language models (LLMs), highlighting their modeling principles, embedding structures, and attention mechanisms. By reviewing and analyzing the relevant literature, the methods adopted by PLMs and LLMs are explored, emphasizing their unique features. The results reveal that PLMs, with their sparse attention mechanism and highly linearly separable embeddings, demonstrate superior capabilities in processing long sequences for pattern extraction, while language models focus on semantic dependencies. These differences reveal the potential for cross-domain optimization, helping to improve the application of Transformers in sequence analysis and generation.

Keywords:

Natural Language Processing (NLP), Protein Language Models (PLMs), Large Language Models, Transformer

View PDF
Xu,C. (2025). Comparison Between Transformer-based Protein Language Models and Traditional Text-based Language Models from a Computer Science Perspective. Applied and Computational Engineering,207,55-60.

References

[1]. Zhao, H. (2013). Synthetic biology: tools and applications. Academic Press, 13-18.

[2]. Heinzinger, M., & Rost, B. (2025). Teaching AI to speak protein. Current Opinion in Structural Biology, 91, 102986.

[3]. Wang, X., Luo, J., Cai, X., et al. (2025). DeepHVI: A multimodal deep learning framework for predicting human-virus protein-protein interactions using protein language models. Biosafety and Health, 7(4), 257-266.

[4]. Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), 5998-6008.

[5]. Wang, L., Li, X., Zhang, H., et al. (2025). A comprehensive review of protein language models. arXiv. https: //arxiv.org/abs/2502.06881v1

[6]. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 4171-4186.

[7]. Radford, A., Narasimhan, K., Salimans, T., et al. (2018). Improving language understanding by generative pre-training. https: //cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

[8]. Lin, Z., Akin, H., Rao, R., et al. (2023). Language models of protein sequences at the scale of evolution enable accurate structure prediction. Science, 379(6637), 1123-1130.

[9]. Madani, A., McCann, B., Naik, N., et al. (2020). ProGen: Language modeling for protein generation. bioRxiv. doi: 10.1101/2020.03.07.982272.

[10]. Nijkamp, E., Ruffolo, J. A., Weinstein, E. N., et al. (2023). ProGen2: exploring the boundaries of protein language models. Cell systems, 14(11), 968-978.

[11]. Elnaggar, A., Heinzinger, M., Dallago, C., et al. (2020). ProtTrans: Towards cracking the language of life’s code through self-supervised deep learning and high performance computing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 7112-7127.

[12]. Wang, A., Singh, A., Michael, J., et al. (2019). GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 7th International Conference on Learning Representations (ICLR). Available at: https: //arxiv.org/abs/1804.07461 doi: 10.48550/arXiv.1804.07461.

[13]. Salazar, J., Liang, D., Nguyen, T. Q., & Kirchhoff, K. (2020). Masked language model scoring. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2699-2712.

[14]. Radford, A., Wu, J., Child, R., et al. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.

[15]. Rao, R., Bhattacharya, N., Thomas, N., et al. (2019). Evaluating protein transfer learning with TAPE. Advances in neural information processing systems, 32.

[16]. Rives, A., Meier, J., Sercu, T., et al. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. PNAS, 118(15).

[17]. Brown, T., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems (NeurIPS), 33, 1877-1901.

[18]. Achiam, J., Adler, S., Agarwal, S., et al. (2023). Gpt-4 technical report. arXiv preprint arXiv: 2303.08774.

[19]. Rao, R., Liu, J., Verkuil, R., et al. (2021). MSA Transformer: Modeling protein sequences with evolutionary data. Proceedings of the 38th International Conference on Machine Learning (ICML).

[20]. Zaheer, M., Guruganesh, G., Dubey, K. A., et al. (2020). Big Bird: Transformers for longer sequences. Advances in Neural Information Processing Systems, 33, pp. 17283-17297.

[21]. Clark, K., Khandelwal, U., Levy, O., et al. (2019). What does BERT look at? An analysis of BERT’s attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP. Florence, Italy: ACL, 276-286.

[22]. Lin, Z., Akin, H., Rao, R., et al. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637), 1123-1130.

Cite this article

Xu,C. (2025). Comparison Between Transformer-based Protein Language Models and Traditional Text-based Language Models from a Computer Science Perspective. Applied and Computational Engineering,207,55-60.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-SPML 2026 Symposium: The 2nd Neural Computing and Applications Workshop 2025

ISBN: 978-1-80590-539-4(Print) / 978-1-80590-540-0(Online)
Editor: Marwan Omar, Guozheng Rao
Conference date: 21 December 2025
Series: Applied and Computational Engineering
Volume number: Vol.207
ISSN: 2755-2721(Print) / 2755-273X(Online)