EdgeNAT: An Efficient Transformer-Based Model for Edge Detection

Junrong Hu; Junrong Chen; Junquan Bi; Kani Chen

doi:10.54254/2755-2721/2025.28255

Applied and Computational EngineeringOpen access

EdgeNAT: An Efficient Transformer-Based Model for Edge Detection

Research Article

Open Access

EdgeNAT: An Efficient Transformer-Based Model for Edge Detection

Junrong Hu ^1* Junrong Chen ², Junquan Bi ³, Kani Chen ⁴

¹ The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, 999077, Hong Kong SAR, China

² The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, 999077, Hong Kong SAR, China

³ The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, 999077, Hong Kong SAR, China

⁴ The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, 999077, Hong Kong SAR, China

^*Corresponding author: jronghu@163.com

Published on 22 October 2025

ACE Vol.197

ISSN (Print): 2755-273X

ISSN (Online): 2755-2721

ISBN (Print): 978-1-80590-465-6

ISBN (Online): 978-1-80590-466-3

Download Cover

Abstract

Edge detection remains a foundational operation in computer vision pipelines, yet the community still grapples with the trade-off between accuracy, crisp localization, and computational efficiency. Convolutional networks excel at local gradient modeling but struggle to maintain global coherence without heavy multi-scale designs, while global self-attention achieves long-range reasoning at quadratic cost. We present EdgeNAT, a Transformer-based edge detector that integrates neighborhood attention with dynamic multi-scale tokenization to realize strong boundary sharpness at markedly lower compute and memory requirements. EdgeNAT employs a lightweight convolutional stem for gradient-preserving tokens, a pyramid of Neighborhood Attention Transformer (NAT) blocks with dilated neighborhoods to enlarge the receptive field without quadratic complexity, and a decoder with deep supervision aligned to boundary thickness. Theoretically, EdgeNAT reduces the attention complexity fromO(N2)toO(N⋅M)with neighborhood sizeM≪N, which translates into consistent efficiency gains for high-resolution imagery. We further introduce a composite loss that couples balanced cross-entropy with a Dice consistency term to discourage thick or fragmented boundaries. Analyses and ablations against recent journal models suggest that EdgeNAT occupies a favorable Pareto region for accuracy–efficiency in edge tasks and boundary rendering. We also provide theoretical complexity profiles and visualizations that clarify how neighborhood size controls the compute–accuracy frontier. Collectively, these results indicate that locality-biased attention with gradient-aware tokens is a principled and practical design for fast, crisp, and transferable edge detection.

Keywords:

edge detection, Transformer, neighborhood attention, computational efficiency, boundary rendering, deep supervision

View PDF

References

[1]. Soria, X., Sappa, A., Humanante, P., & Akbarinia, A. (2023). Dense extreme inception network for edge detection. Pattern Recognition, 139, 109461.

[2]. Sun, R., Lei, T., Chen, Q., Wang, Z., Du, X., Zhao, W., & Nandi, A. K. (2022). Survey of image edge detection. Frontiers in Signal Processing, 2, 826967.

[3]. Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., ... & Tao, D. (2022). A survey on vision transformer. IEEE transactions on pattern analysis and machine intelligence, 45(1), 87-110.

[4]. Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s), 1-41.

[5]. Wu, Z., Zhang, X., Li, F., Wang, S., & Li, J. (2023). Transrender: a transformer-based boundary rendering segmentation network for stroke lesions. Frontiers in Neuroscience, 17, 1259677.

[6]. Arshad, T., Zhang, J., Anyembe, S. C., & Mehmood, A. (2024). Spectral Spatial Neighborhood Attention Transformer for Hyperspectral Image Classification: Transformateur d’attention de voisinage spatial-spectral pour la classification d’images hyperspectrales. Canadian Journal of Remote Sensing, 50(1), 2347631.

[7]. Hu, G. (2025). A Mathematical Survey of Image Deep Edge Detection Algorithms: From Convolution to Attention. Mathematics, 13(15), 2464.

[8]. Rudnicka, Z., Proniewska, K., Perkins, M., & Pregowska, A. (2024). Health Digital Twins Supported by Artificial Intelligence-based Algorithms and Extended Reality in Cardiology. arXiv preprint arXiv: 2401.14208.

[9]. Zhang, S. X., Yang, C., Zhu, X., & Yin, X. C. (2023). Arbitrary shape text detection via boundary transformer. IEEE Transactions on Multimedia, 26, 1747-1760.

[10]. Huang, K., Tian, C., Xu, Z., Li, N., & Lin, J. C. W. (2023). Motion context guided edge-preserving network for video salient object detection. Expert Systems with Applications, 233, 120739.

[11]. Kishore, P. V. V., Kumar, D. A., Kumar, P. P., Srihari, D., Sasikala, N., & Divyasree, L. (2024). Machine interpretation of ballet dance: Alternating wavelet spatial and channel attention based learning model. IEEE Access, 12, 55264-55280.

[12]. Li, S., Shen, Y., Wang, Y., Zhang, J., Li, H., Zhang, D., & Li, H. (2024). PiDiNet-TIR: An improved edge detection algorithm for weakly textured thermal infrared images based on PiDiNet. Infrared Physics & Technology, 138, 105257.

[13]. Ji, S., Yuan, X., Bao, J., & Liu, T. (2025). LED-Net: A lightweight edge detection network. Pattern Recognition Letters, 187, 56-62.

[14]. Tan, J., Wang, Y., Wu, G., & Wang, L. (2023). Temporal perceiver: A general architecture for arbitrary boundary detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(10), 12506-12520.

[15]. Wang, R., Ma, L., He, G., Johnson, B. A., Yan, Z., Chang, M., & Liang, Y. (2024). Transformers for remote sensing: A systematic review and analysis. Sensors, 24(11), 3495.