Research on 3D Object Detection Technology Based on Multimodal Fusion

Shijie Lyu

doi:10.54254/2755-2721/2025.24680

Applied and Computational EngineeringOpen access

Research on 3D Object Detection Technology Based on Multimodal Fusion

Research Article

Open Access

Research on 3D Object Detection Technology Based on Multimodal Fusion

Shijie Lyu ^1*

¹ Georgia Institute of Technology, North Avenue, Atlanta, GA, 30332, United States of America

^*Corresponding author: slm0718m@163.com

Published on 4 July 2025

ACE Vol.173

ISSN (Print): 2755-273X

ISSN (Online): 2755-2721

ISBN (Print): 978-1-80590-231-7

ISBN (Online): 978-1-80590-232-4

Download Cover

Abstract

To address the challenge of missed detections of long-distance targets in autonomous driving, this study proposes an enhanced 3D object detection model based on the CenterFusion framework, integrating camera and millimeter-wave radar data. An early fusion strategy is employed to project radar data onto the image plane, combining it with image data to form a multi-channel input, thereby enhancing the model’s robustness against interference. Additionally, an attention mechanism is incorporated post-feature fusion to prioritize the extraction of critical information from the fused feature map, significantly improving detection accuracy. The loss function is optimized to mitigate the imbalance between positive and negative samples. Comparative and ablation experiments conducted on the nuScenes dataset demonstrate that the proposed model achieves a 1.5% improvement in average detection accuracy and a 2.1% increase in nuScenes Detection Score (NDS) compared to the baseline CenterFusion model, effectively enhancing long-distance target detection capabilities.

Keywords:

Autonomous Driving, Sensor Fusion, 3D Object Detection, Early Fusion, Attention Mechanism

View PDF

References

[1]. ARNOLD E, AL-JARRAH O Y, DIANATI M, et al.A survey on 3D object detection methods for autonomous driving applications [J].IEEE Transactions on Intelligent Transportation Systems, 2019, 20(10): 3782-3795.

[2]. ZHANG Y P, LU J W, ZHOU J.Objects are different: flexible monocular 3D object detection [C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, June 20-25, 2021: 3288-3297.

[3]. ZHENG Z, YUE X, KEUTZER K, et al.Scene-aware learning network for radar object detection [C]//Proceedings of the 2021 International Conference on Multimedia Retrieval, Taipei, China, August 21-24, 2021: 573-579.

[4]. YANG H, WANG W, CHEN M, et al.PVT-SSD: single-stage 3D object detector with point-voxel transformer [C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, June 18-22, 2023: 13476-13487.

[5]. YIN T W, ZHOU X Y, KRÄHENBÜHL P.Multimodal virtual point 3D detection [J].Advances in Neural Information Processing Systems, 2021, 34: 16494-16507.

[6]. KIM Y, SHIN J, KIM S, et al.CRN: camera radar net for accurate, robust, efficient 3D perception [C]//2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, October 1-6, 2023: 17569-17580.

[7]. NABATI R, QI H R.CenterFusion: center-based radar and camera fusion for 3D object detection [C]//2021 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, January 3-8, 2021: 1526-1535.

[8]. NOBIS F, GEISSLINGER M, WEBER M, et al.A deep learning-based radar and camera sensor fusion architecture for object detection [C]//2019 Sensor Data Fusion: Trends, Solutions, Applications, Bonn, Germany, October 15-17, 2019: 1-7.

[9]. ZHANG H, ZU K, LU J, et al.EPSANet: an efficient pyramid squeeze attention block on convolutional neural network [C]//Proceedings of the Asian Conference on Computer Vision, Macao, China, December 4-8, 2022: 1161-1177.

[10]. GEIGER A, LENZ P, URTASUN R.Are we ready for autonomous driving?The KITTI vision benchmark suite [C]//2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, June 16-21, 2012: 3354-3361.

[11]. SUN P, KRETZSCHMAR H, DOTIWALLA X, et al.Scalability in perception for autonomous driving: waymo open dataset [C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, June 13-19, 2020: 2443–2451.

References

[5]. YIN T W, ZHOU X Y, KRÄHENBÜHL P.Multimodal virtual point 3D detection [J].Advances in Neural Information Processing Systems, 2021, 34: 16494-16507.

Cite this article

Lyu,S. (2025). Research on 3D Object Detection Technology Based on Multimodal Fusion. Applied and Computational Engineering,173,22-28.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of the 7th International Conference on Computing and Data Science

ISBN: 978-1-80590-231-7(Print) / 978-1-80590-232-4(Online)

Editor: Marwan Omar

Conference website: https://2025.confcds.org/

Conference date: 25 September 2025

Series: Applied and Computational Engineering

Volume number: Vol.173

ISSN: 2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:

1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.

2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).