Comparison Between Transformer-based Protein Language Models and Traditional Text-based Language Models from a Computer Science Perspective

Chenglin Xu

doi:10.54254/2755-2721/2026.TJ29612

Applied and Computational EngineeringOpen access

Comparison Between Transformer-based Protein Language Models and Traditional Text-based Language Models from a Computer Science Perspective

Research Article

Open Access

Comparison Between Transformer-based Protein Language Models and Traditional Text-based Language Models from a Computer Science Perspective

Chenglin Xu ^1*

¹ Beijing Jiaotong University

^*Corresponding author: 23722029@bjtu.edu.cn

Published on 19 November 2025

ACE Vol.207

ISSN (Print): 2755-273X

ISSN (Online): 2755-2721

ISBN (Print): 978-1-80590-539-4

ISBN (Online): 978-1-80590-540-0

Download Cover

Abstract

The Transformer architecture has transformed natural language processing (NLP) by enabling efficient sequence modeling through self-attention and embedding techniques. However, its ability to adapt to domain-specific data, such as protein sequences, introduces both unique computational challenges and opportunities. As the sequence space increases, the understanding of architectural differences is crucial for improving model efficiency and generalization. This study aims to investigate the fundamental differences between protein language models (PLMs) and traditional text-based language models (LLMs), highlighting their modeling principles, embedding structures, and attention mechanisms. By reviewing and analyzing the relevant literature, the methods adopted by PLMs and LLMs are explored, emphasizing their unique features. The results reveal that PLMs, with their sparse attention mechanism and highly linearly separable embeddings, demonstrate superior capabilities in processing long sequences for pattern extraction, while language models focus on semantic dependencies. These differences reveal the potential for cross-domain optimization, helping to improve the application of Transformers in sequence analysis and generation.

Keywords:

Natural Language Processing (NLP), Protein Language Models (PLMs), Large Language Models, Transformer

View PDF

References

[1]. Zhao, H. (2013). Synthetic biology: tools and applications. Academic Press, 13-18.

[2]. Heinzinger, M., & Rost, B. (2025). Teaching AI to speak protein. Current Opinion in Structural Biology, 91, 102986.

[3]. Wang, X., Luo, J., Cai, X., et al. (2025). DeepHVI: A multimodal deep learning framework for predicting human-virus protein-protein interactions using protein language models. Biosafety and Health, 7(4), 257-266.

[4]. Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), 5998-6008.

[5]. Wang, L., Li, X., Zhang, H., et al. (2025). A comprehensive review of protein language models. arXiv. https: //arxiv.org/abs/2502.06881v1

[6]. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 4171-4186.

[7]. Radford, A., Narasimhan, K., Salimans, T., et al. (2018). Improving language understanding by generative pre-training. https: //cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

[8]. Lin, Z., Akin, H., Rao, R., et al. (2023). Language models of protein sequences at the scale of evolution enable accurate structure prediction. Science, 379(6637), 1123-1130.

[9]. Madani, A., McCann, B., Naik, N., et al. (2020). ProGen: Language modeling for protein generation. bioRxiv. doi: 10.1101/2020.03.07.982272.

[10]. Nijkamp, E., Ruffolo, J. A., Weinstein, E. N., et al. (2023). ProGen2: exploring the boundaries of protein language models. Cell systems, 14(11), 968-978.

[11]. Elnaggar, A., Heinzinger, M., Dallago, C., et al. (2020). ProtTrans: Towards cracking the language of life’s code through self-supervised deep learning and high performance computing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 7112-7127.

[12]. Wang, A., Singh, A., Michael, J., et al. (2019). GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 7th International Conference on Learning Representations (ICLR). Available at: https: //arxiv.org/abs/1804.07461 doi: 10.48550/arXiv.1804.07461.

[13]. Salazar, J., Liang, D., Nguyen, T. Q., & Kirchhoff, K. (2020). Masked language model scoring. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2699-2712.

[14]. Radford, A., Wu, J., Child, R., et al. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.

[15]. Rao, R., Bhattacharya, N., Thomas, N., et al. (2019). Evaluating protein transfer learning with TAPE. Advances in neural information processing systems, 32.

[16]. Rives, A., Meier, J., Sercu, T., et al. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. PNAS, 118(15).

[17]. Brown, T., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems (NeurIPS), 33, 1877-1901.

[18]. Achiam, J., Adler, S., Agarwal, S., et al. (2023). Gpt-4 technical report. arXiv preprint arXiv: 2303.08774.

[19]. Rao, R., Liu, J., Verkuil, R., et al. (2021). MSA Transformer: Modeling protein sequences with evolutionary data. Proceedings of the 38th International Conference on Machine Learning (ICML).

[20]. Zaheer, M., Guruganesh, G., Dubey, K. A., et al. (2020). Big Bird: Transformers for longer sequences. Advances in Neural Information Processing Systems, 33, pp. 17283-17297.

[21]. Clark, K., Khandelwal, U., Levy, O., et al. (2019). What does BERT look at? An analysis of BERT’s attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP. Florence, Italy: ACL, 276-286.

[22]. Lin, Z., Akin, H., Rao, R., et al. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637), 1123-1130.