A Study on Transformer Optimization for Image Processing on Edge Devices
Research Article
Open Access
CC BY

A Study on Transformer Optimization for Image Processing on Edge Devices

Xiaotian Tong 1*
1 Electrical and Computer Engineering, Duke University, Durham, NC 27708, United States
*Corresponding author: xiaotiantong237@gmail.com
Published on 13 August 2025
Journal Cover
ACE Vol.184
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-80590-307-9
ISBN (Online): 978-1-80590-308-6
Download Cover

Abstract

Transformer models have achieved groundbreaking success in computer vision tasks, yet their deployment on resource-constrained edge devices remains challenging due to high computational complexity, memory demands, and hardware inefficiencies. This paper presents a holistic optimization framework to address these issues for real-time image processing in edge environments, particularly in autonomous driving systems. We propose a dynamic structured pruning method that adjusts model sparsity based on real-time scene complexity, combined with post-training quantization to compress model size while preserving accuracy. In addition, we co-design the algorithm with FPGA and SoC hardware platforms, leveraging custom sparse kernels, memory hierarchy optimization, and energy-efficient execution techniques. Evaluated on the KITTI and Cityscapes datasets, our method achieves a 55% reduction in inference latency with less than a 2% loss in accuracy, and improves energy efficiency by up to 3.1×. Real-world tests confirm the robustness of the system under diverse operating conditions. This work offers a scalable and adaptable solution for deploying high-performance Transformer models in edge AI applications.

Keywords:

Vision Transformers, Dynamic Structured Pruning, Post-Training Quantization, Hardware-Software Co-Design Edge AI

View PDF
Tong,X. (2025). A Study on Transformer Optimization for Image Processing on Edge Devices. Applied and Computational Engineering,184,7-15.

References

[1]. Han, Kai , et al. "A Survey on Vision Transformer." (2020).

[2]. Liu, Ze , et al. "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows." (2021).

[3]. Zhu, Xizhou , et al. "Deformable DETR: Deformable Transformers for End-to-End Object Detection." (2020).

[4]. Kang, Beom Jin , et al. "A survey of FPGA and ASIC designs for transformer inference acceleration and optimization." Journal of Systems Architecture 155(2024).

[5]. Mutlu, Onur , S. Ghose , and R. Ausavarungnirun . "Recent Advances in Overcoming Bottlenecks in Memory Systems and Managing Memory Resources in GPU Systems." (2018).

[6]. Xu, Qiumin , H. Jeon , and M. Annavaram . "Graph processing on GPUs: Where are the bottlenecks?." IEEE International Symposium on Workload Characterization IEEE, 2014.

[7]. Mao, Jiachen , et al. "TPrune: Efficient Transformer Pruning for Mobile Devices." ACM Transactions on Cyber-Physical Systems 5.3(2021): 1-22.

[8]. Lu, Siyuan , et al. "Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer." (2020).

[9]. Ruay-Shiung, et al. "A green energy-efficient scheduling algorithm using the DVFS technique for cloud datacenters." Future generations computer systems: FGCS (2014).

Cite this article

Tong,X. (2025). A Study on Transformer Optimization for Image Processing on Edge Devices. Applied and Computational Engineering,184,7-15.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-MLA 2025 Symposium: Intelligent Systems and Automation: AI Models, IoT, and Robotic Algorithms

ISBN: 978-1-80590-307-9(Print) / 978-1-80590-308-6(Online)
Editor: Hisham AbouGrad
Conference website: https://www.confmla.org/
Conference date: 17 November 2025
Series: Applied and Computational Engineering
Volume number: Vol.184
ISSN: 2755-2721(Print) / 2755-273X(Online)