Research on 3D Object Detection Technology Based on Multimodal Fusion
Research Article
Open Access
CC BY

Research on 3D Object Detection Technology Based on Multimodal Fusion

Shijie Lyu 1*
1 Georgia Institute of Technology, North Avenue, Atlanta, GA, 30332, United States of America
*Corresponding author: slm0718m@163.com
Published on 4 July 2025
Journal Cover
ACE Vol.173
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-80590-231-7
ISBN (Online): 978-1-80590-232-4
Download Cover

Abstract

To address the challenge of missed detections of long-distance targets in autonomous driving, this study proposes an enhanced 3D object detection model based on the CenterFusion framework, integrating camera and millimeter-wave radar data. An early fusion strategy is employed to project radar data onto the image plane, combining it with image data to form a multi-channel input, thereby enhancing the model’s robustness against interference. Additionally, an attention mechanism is incorporated post-feature fusion to prioritize the extraction of critical information from the fused feature map, significantly improving detection accuracy. The loss function is optimized to mitigate the imbalance between positive and negative samples. Comparative and ablation experiments conducted on the nuScenes dataset demonstrate that the proposed model achieves a 1.5% improvement in average detection accuracy and a 2.1% increase in nuScenes Detection Score (NDS) compared to the baseline CenterFusion model, effectively enhancing long-distance target detection capabilities.

Keywords:

Autonomous Driving, Sensor Fusion, 3D Object Detection, Early Fusion, Attention Mechanism

View PDF
Lyu,S. (2025). Research on 3D Object Detection Technology Based on Multimodal Fusion. Applied and Computational Engineering,173,22-28.

References

[1]. ARNOLD E, AL-JARRAH O Y, DIANATI M, et al.A survey on 3D object detection methods for autonomous driving applications [J].IEEE Transactions on Intelligent Transportation Systems, 2019, 20(10): 3782-3795.

[2]. ZHANG Y P, LU J W, ZHOU J.Objects are different: flexible monocular 3D object detection [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, June 20-25, 2021: 3288-3297.

[3]. ZHENG Z, YUE X, KEUTZER K, et al.Scene-aware learning network for radar object detection [C]//Proceedings of the 2021 International Conference on Multimedia Retrieval, Taipei, China, August 21-24, 2021: 573-579.

[4]. YANG H, WANG W, CHEN M, et al.PVT-SSD: single-stage 3D object detector with point-voxel transformer [C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, June 18-22, 2023: 13476-13487.

[5]. YIN T W, ZHOU X Y, KRÄHENBÜHL P.Multimodal virtual point 3D detection [J].Advances in Neural Information Processing Systems, 2021, 34: 16494-16507.

[6]. KIM Y, SHIN J, KIM S, et al.CRN: camera radar net for accurate, robust, efficient 3D perception [C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, October 1-6, 2023: 17569-17580.

[7]. NABATI R, QI H R.CenterFusion: center-based radar and camera fusion for 3D object detection [C]//2021 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, January 3-8, 2021: 1526-1535.

[8]. NOBIS F, GEISSLINGER M, WEBER M, et al.A deep learning-based radar and camera sensor fusion architecture for object detection [C]//2019 Sensor Data Fusion: Trends, Solutions, Applications, Bonn, Germany, October 15-17, 2019: 1-7.

[9]. ZHANG H, ZU K, LU J, et al.EPSANet: an efficient pyramid squeeze attention block on convolutional neural network [C]//Proceedings of the Asian Conference on Computer Vision, Macao, China, December 4-8, 2022: 1161-1177.

[10]. GEIGER A, LENZ P, URTASUN R.Are we ready for autonomous driving?The KITTI vision benchmark suite [C]//2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, June 16-21, 2012: 3354-3361.

[11]. SUN P, KRETZSCHMAR H, DOTIWALLA X, et al.Scalability in perception for autonomous driving: waymo open dataset [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, June 13-19, 2020: 2443–2451.

Cite this article

Lyu,S. (2025). Research on 3D Object Detection Technology Based on Multimodal Fusion. Applied and Computational Engineering,173,22-28.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of the 7th International Conference on Computing and Data Science

ISBN: 978-1-80590-231-7(Print) / 978-1-80590-232-4(Online)
Editor: Marwan Omar
Conference website: https://2025.confcds.org/
Conference date: 25 September 2025
Series: Applied and Computational Engineering
Volume number: Vol.173
ISSN: 2755-2721(Print) / 2755-273X(Online)