Analysis of the Effectiveness of Deep Learning Spam Email Classifiers against Text-Based Attacks

Jianchao Li

doi:10.54254/2755-2721/2026.TJ30298

Applied and Computational EngineeringOpen access

Analysis of the Effectiveness of Deep Learning Spam Email Classifiers against Text-Based Attacks

Research Article

Open Access

Analysis of the Effectiveness of Deep Learning Spam Email Classifiers against Text-Based Attacks

Jianchao Li ^1*

¹ Guangzhou Nanfang College

^*Corresponding author: outlook_165D168DEE309A11@outlook.com

Published on 3 December 2025

ACE Vol.211

ISSN (Print): 2755-273X

ISSN (Online): 2755-2721

ISBN (Print): 978-1-80590-579-0

ISBN (Online): 978-1-80590-580-6

Download Cover

Abstract

With the widespread application of machine learning, particularly deep learning models, in the field of cybersecurity, the intelligence of spam filtering systems has been continuously enhanced. Deep learning classifiers, with their advantages such as character-level feature learning and semantic invariance, have become the preferred choice for deployment. However, these models rely on surface text features, making them vulnerable to adversarial attacks. As a result, they exhibit significant vulnerability when facing carefully constructed text adversarial attacks. Text adversarial attacks, through covert modifications such as synonym substitution and character perturbation, can mislead the model to misjudge malicious emails, leading to risky spam emails such as phishing and fraud passing through the defense system. This study first elaborates on three attack methods, namely character-level attack, word-level attack, and sentence-level attack. Secondly, it introduces the existing limitations of spam email attacks and then this study comprehensively reviews the key findings in the existing research results: deep learning models generally have a high attack success rate (ASR). The aim is to provide a theoretical basis for building a more robust next-generation spam email filtering system.

Keywords:

Deep Learning, Spam Emails, Character Set Attack, Word-level Attack

View PDF

References

[1]. Lin, Z., Liu, Z., & Fan, H. (2025). Improving Phishing Email Detection Performance of Small Large Language Models. arXiv preprint arXiv: 2505.00034.

[2]. Eger, S., & Benz, Y. (2020). From Hero to Z\'eroe: A Benchmark of Low-Level Adversarial Attacks. arXiv preprint arXiv: 2010.05648.

[3]. Chen, X., Salem, A., Chen, D., Backes, M., Ma, S., Shen, Q., ... & Zhang, Y. (2021, December). Badnl: Backdoor attacks against nlp models with semantic-preserving improvements. In Proceedings of the 37th Annual Computer Security Applications Conference (pp. 554-569).

[4]. Li, L., Ma, R., Guo, Q., Xue, X., & Qiu, X. (2020). Bert-attack: Adversarial attack against bert using bert. arXiv preprint arXiv: 2004.09984.

[5]. Belinkov, Y., & Bisk, Y. (2017). Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv: 1711.02173.

[6]. Boucher, N., Shumailov, I., Anderson, R., & Papernot, N. (2022, May). Bad characters: Imperceptible nlp attacks. In 2022 IEEE Symposium on Security and Privacy (SP) (pp. 1987-2004). IEEE.

[7]. Gao, J., Lanchantin, J., Soffa, M. L., & Qi, Y. (2018, May). Black-box generation of adversarial text sequences to evade deep learning classifiers. In 2018 IEEE Security and Privacy Workshops (SPW) (pp. 50-56). IEEE.

[8]. Gao, J., Lanchantin, J., Soffa, M. L., & Qi, Y. (2018, May). Black-box generation of adversarial text sequences to evade deep learning classifiers. In 2018 IEEE Security and Privacy Workshops (SPW) (pp. 50-56). IEEE.

[9]. Jin, D., Jin, Z., Zhou, J. T., & Szolovits, P. (2019). Is bert really robust? natural language attack on text classification and entailment. arXiv preprint arXiv: 1907.11932, 2(10).

[10]. Zang, Y., Qi, F., Yang, C., Liu, Z., Zhang, M., Liu, Q., & Sun, M. (2019). Word-level textual adversarial attacking as combinatorial optimization. arXiv preprint arXiv: 1910.12196.

[11]. Gregory, J., & Liao, Q. (2023, September). Adversarial spam generation using adaptive gradient-based word embedding perturbations. In 2023 IEEE International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings) (pp. 1-5). IEEE.

[12]. Huq, A., & Pervin, M. (2020). Adversarial attacks and defense on texts: A survey. arXiv preprint arXiv: 2005.14108.

[13]. Hotoğlu, E., Sen, S., & Can, B. (2025). A Comprehensive Analysis of Adversarial Attacks against Spam Filters. arXiv preprint arXiv: 2505.03831.