A Study of YOLO, Transformer and Diffusion Model for Small Object Detection
Research Article
Open Access
CC BY

A Study of YOLO, Transformer and Diffusion Model for Small Object Detection

Tiancheng Hu 1*
1 UCL Department of mathematics, University College London, London, The United Kingdom
*Corresponding author: zcahth3@ucl.ac.uk
Published on 5 November 2025
Volume Cover
ACE Vol.204
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-80590-517-2
ISBN (Online): 978-1-80590-518-9
Download Cover

Abstract

In recent years, algorithms in the field of computer vision have been continuously innovated and promoted, and the progress of small object detection has become a key task in the development of this field. However, compared with the detection of medium and large targets, factors such as background interference can easily interfere with the detection of small targets with smaller pixel coverage areas, making progress more difficult. In recent years, researchers have proposed various methods to address these challenges, and the three most representative frameworks are algorithms developed using YOLO, Transformer, and Diffusion models. This article provides a detailed overview and comparison of three models. The YOLO based method is superior in improving real-time detection through multi-scale feature enhancement, structural optimization, and adjusting the loss function. Based on the Transformer, the accuracy and precision of identifying small targets are improved by adjusting the mechanism, using a hybrid structure and multimodal feature fusion. And researchers will adjust the diffusion process, involving the construction of diffusion bounding boxes and diffusion engines, to enable the application of diffusion model algorithms. Finally, this article summarizes the advantages and limitations of these methods and discusses potential future research directions. The significance of this study lies in providing a unified overview of the three main research paradigms, helping researchers understand current progress, identify existing challenges, and explore new possibilities for advancing small object detection.

Keywords:

Small Object Detection, YOLO, Transformer, Diffusion Model.

View PDF
Hu,T. (2025). A Study of YOLO, Transformer and Diffusion Model for Small Object Detection. Applied and Computational Engineering,204,1-7.

References

[1]. Shao, Y., Zhang, D., Chu, H., Zhang, X., & Rao, Y. (2022). A review of YOLO object detection based on deep learning. Journal of Electronics and Information Technology, 44(10), 3697-3708.

[2]. Liu, M., Wang, X., Zhou, A., Fu, X., Ma, Y., & Piao, C. (2020). Uav-yolo: Small object detection on unmanned aerial vehicle perspective. Sensors, 20(8), 2238.

[3]. Benjumea, A., Teeti, I., Cuzzolin, F., & Bradley, A. (2021). YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles. arXiv preprint arXiv: 2112.11798.

[4]. Ji, S. J., Ling, Q. H., & Han, F. (2023). An improved algorithm for small object detection based on YOLO v4 and multi-scale contextual information. Computers and Electrical Engineering, 105, 108490.

[5]. Fengchang, X., Alfred, R., Pailus, R. H., Ge, L., Shifeng, D., Chew, J. V. L., ... & Xinliang, W. (2024). DETR novel small target detection algorithm based on Swin transformer. IEEE Access, 12, 115838-115852.

[6]. Rekavandi, A. M., Rashidi, S., Boussaid, F., Hoefs, S., & Akbas, E. (2023). Transformers in small object detection: A benchmark and survey of state-of-the-art. arXiv preprint arXiv: 2309.04902.

[7]. Chen, G., Mao, Z., Wang, K., & Shen, J. (2023). HTDet: A hybrid transformer-based approach for underwater small object detection. Remote Sensing, 15(4), 1076.

[8]. Chen, S., Sun, P., Song, Y., & Luo, P. (2023). Diffusiondet: Diffusion model for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 19830-19843).

[9]. Chen, Z., Gao, R., Xiang, T. Z., & Lin, F. (2023). Diffusion model for camouflaged object detection. arXiv preprint arXiv: 2308.00303.

[10]. Zhang, M., Wu, J., Ren, Y., Yang, J., Li, M., & Ma, A. J. (2025). Diffusionengine: Diffusion model is scalable data engine for object detection. Pattern Recognition, 112141.

Cite this article

Hu,T. (2025). A Study of YOLO, Transformer and Diffusion Model for Small Object Detection. Applied and Computational Engineering,204,1-7.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-MLA 2025 Symposium: Intelligent Systems and Automation: AI Models, IoT, and Robotic Algorithms

ISBN: 978-1-80590-517-2(Print) / 978-1-80590-518-9(Online)
Editor: Hisham AbouGrad
Conference date: 12 November 2025
Series: Applied and Computational Engineering
Volume number: Vol.204
ISSN: 2755-2721(Print) / 2755-273X(Online)