A Study of YOLO, Transformer and Diffusion Model for Small Object Detection

Tiancheng Hu

doi:10.54254/2755-2721/2025.LD29184

Applied and Computational EngineeringOpen access

A Study of YOLO, Transformer and Diffusion Model for Small Object Detection

Research Article

Open Access

A Study of YOLO, Transformer and Diffusion Model for Small Object Detection

Tiancheng Hu ^1*

¹ UCL Department of mathematics, University College London, London, The United Kingdom

^*Corresponding author: zcahth3@ucl.ac.uk

Published on 5 November 2025

ACE Vol.204

ISSN (Print): 2755-273X

ISSN (Online): 2755-2721

ISBN (Print): 978-1-80590-517-2

ISBN (Online): 978-1-80590-518-9

Download Cover

Abstract

In recent years, algorithms in the field of computer vision have been continuously innovated and promoted, and the progress of small object detection has become a key task in the development of this field. However, compared with the detection of medium and large targets, factors such as background interference can easily interfere with the detection of small targets with smaller pixel coverage areas, making progress more difficult. In recent years, researchers have proposed various methods to address these challenges, and the three most representative frameworks are algorithms developed using YOLO, Transformer, and Diffusion models. This article provides a detailed overview and comparison of three models. The YOLO based method is superior in improving real-time detection through multi-scale feature enhancement, structural optimization, and adjusting the loss function. Based on the Transformer, the accuracy and precision of identifying small targets are improved by adjusting the mechanism, using a hybrid structure and multimodal feature fusion. And researchers will adjust the diffusion process, involving the construction of diffusion bounding boxes and diffusion engines, to enable the application of diffusion model algorithms. Finally, this article summarizes the advantages and limitations of these methods and discusses potential future research directions. The significance of this study lies in providing a unified overview of the three main research paradigms, helping researchers understand current progress, identify existing challenges, and explore new possibilities for advancing small object detection.

Keywords:

Small Object Detection, YOLO, Transformer, Diffusion Model.

View PDF

References

[1]. Shao, Y., Zhang, D., Chu, H., Zhang, X., & Rao, Y. (2022). A review of YOLO object detection based on deep learning. Journal of Electronics and Information Technology, 44(10), 3697-3708.

[2]. Liu, M., Wang, X., Zhou, A., Fu, X., Ma, Y., & Piao, C. (2020). Uav-yolo: Small object detection on unmanned aerial vehicle perspective. Sensors, 20(8), 2238.

[3]. Benjumea, A., Teeti, I., Cuzzolin, F., & Bradley, A. (2021). YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles. arXiv preprint arXiv: 2112.11798.

[4]. Ji, S. J., Ling, Q. H., & Han, F. (2023). An improved algorithm for small object detection based on YOLO v4 and multi-scale contextual information. Computers and Electrical Engineering, 105, 108490.

[5]. Fengchang, X., Alfred, R., Pailus, R. H., Ge, L., Shifeng, D., Chew, J. V. L., ... & Xinliang, W. (2024). DETR novel small target detection algorithm based on Swin transformer. IEEE Access, 12, 115838-115852.

[6]. Rekavandi, A. M., Rashidi, S., Boussaid, F., Hoefs, S., & Akbas, E. (2023). Transformers in small object detection: A benchmark and survey of state-of-the-art. arXiv preprint arXiv: 2309.04902.

[7]. Chen, G., Mao, Z., Wang, K., & Shen, J. (2023). HTDet: A hybrid transformer-based approach for underwater small object detection. Remote Sensing, 15(4), 1076.

[8]. Chen, S., Sun, P., Song, Y., & Luo, P. (2023). Diffusiondet: Diffusion model for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 19830-19843).

[9]. Chen, Z., Gao, R., Xiang, T. Z., & Lin, F. (2023). Diffusion model for camouflaged object detection. arXiv preprint arXiv: 2308.00303.

[10]. Zhang, M., Wu, J., Ren, Y., Yang, J., Li, M., & Ma, A. J. (2025). Diffusionengine: Diffusion model is scalable data engine for object detection. Pattern Recognition, 112141.