Exploring the Application of Reinforcement Learning in the Path Planning Algorithm of UAVs

Xiaoxu Wang

doi:10.54254/2755-2721/2025.LD26123

Applied and Computational EngineeringOpen access

Exploring the Application of Reinforcement Learning in the Path Planning Algorithm of UAVs

Research Article

Open Access

Exploring the Application of Reinforcement Learning in the Path Planning Algorithm of UAVs

Xiaoxu Wang ^1*

¹ Silesian College of Intelligent Science and Engineering at Yanshan University, Qinhuangdao, 066000, China

^*Corresponding author: w2068729645@outlook.com

Published on 20 August 2025

ACE Vol.179

ISSN (Print): 2755-273X

ISSN (Online): 2755-2721

ISBN (Print): 978-1-80590-184-6

ISBN (Online): 978-1-80590-129-7

Download Cover

Abstract

UAVs are widely used in areas such as monitoring, delivery, and disaster rescue due to their ability to work in harsh environments. Classic UAV path planning algorithms rely on pre-known accurate environment maps. How can a UAV quickly learn appropriate ways under unknown real conditions? This poses a crucial research problem for path planning of unmanned aircraft vehicles in reality. Autonomous autonomous path planning technology has become more important because of the ever-increasing application occasions of unmanned aerial vehicles. As opposed to other navigation technologies, Reinforcement Learning(RL) provides drones with learning skills to master how to navigate using only interactions with an area, not maps or 3D models of that region. Therefore, this paper first makes a survey and analysis of Reinforcement Learning in UAV path planning and then summarizes key advantages and present drawbacks of typical RL methods for navigating autonomous agents between waypoints, ranging from Q-learning techniques up to modern methods such as A3C and HRL. At last, it concludes smooth-policy-achieved advantage-function-algorithms are proper for constructing good smooth motion plans in continuous-state spaces, where multi-layer hierarchy architecture will also provide reasonable options but mainly at larger scale instances, thereby directing next-stage research activities towards the optimization of the automated movement system in the unmanned plane in favor of a broader development into more wise flight control agents.

Keywords:

Unmanned Aerial Vehicles (UAVs), Reinforcement learning, Q-Learning, DQN, A2C/A3C

View PDF

References

[1]. Hart P E, Nilsson N J, Raphael B. A formal basis for the heuristic determination of minimum cost paths [J]. IEEE transactions on Systems Science and Cybernetics, 1968, 4(2): 100-107.

[2]. Qiang W, Zhongli Z. Reinforcement learning model, algorithms and its application [C]//2011 International Conference on Mechatronic Science, Electric Engineering and Computer (MEC). IEEE, 2011: 1143-1146.

[3]. Lv L, Zhang S, Ding D, et al. Path planning via an improved DQN-based learning policy [J]. IEEE Access, 2019, 7: 67319-67330.

[4]. Zhao Y, Zhang Y, Wang S. A review of mobile robot path planning based on deep reinforcement learning algorithm [C]//Journal of Physics: Conference Series. IOP Publishing, 2021, 2138(1): 012011.

[5]. Liu L, Tian B, Zhao X, et al. UAV autonomous trajectory planning in target tracking tasks via a DQN approach [C]//2019 IEEE International Conference on Real-time Computing and Robotics (RCAR). IEEE, 2019: 277-282.

[6]. Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning [C]//International conference on machine learning. PmLR, 2016: 1928-1937.

[7]. Barto A G, Mahadevan S. Recent advances in hierarchical reinforcement learning [J]. Discrete event dynamic systems, 2003, 13: 341-379.

[8]. Pateria S, Subagdja B, Tan A, et al. Hierarchical reinforcement learning: A comprehensive survey [J]. ACM Computing Surveys (CSUR), 2021, 54(5): 1-35.

[9]. Ng A Y, Russell S. Algorithms for inverse reinforcement learning [C]//Icml. 2000, 1(2): 2.

[10]. Jordan S, Chandak Y, Cohen D, et al. Evaluating the performance of reinforcement learning algorithms [C]//International Conference on Machine Learning. PMLR, 2020: 4962-4973.