Optimization Application of PPO Algorithm in Reinforcement Learning in Drone Attitude Balance

Spencer Wang

doi:10.54254/2753-7048/2026.HZ27323

Lecture Notes in Education Psychology and Public MediaOpen access

Optimization Application of PPO Algorithm in Reinforcement Learning in Drone Attitude Balance

Research Article

Open Access

Optimization Application of PPO Algorithm in Reinforcement Learning in Drone Attitude Balance

Spencer Wang ^1*

¹ Princeton International School of Mathematics and Science

^*Corresponding author: spencer.20070706@outlook.com

Published on 2 October 2025

LNEP Vol.125

ISSN (Print): 2753-7056

ISSN (Online): 2753-7048

ISBN (Print): 978-1-80590-407-6

ISBN (Online): 978-1-80590-408-3

Download Cover

Abstract

Robust attitude and position control remain critical challenges for consumer drones. This studies evaluates the performance of a Reinforcement Leaning algorithm (PPO) against traditional control algorithms on drone's stability in both computer simulation and real life situations. Reinforcement Learning is trained with common parameters and rewards for small attitude error, lower angular rates, and efficient control effort. Performance were measured across level 0 to 5 wind in simulation and level 0 to 3 in real life experimentation. Results showed that PPO out-performed traditional PID controller in both computer simulation and real life experimentation. PPO showed better stability than PID with reasonable actuation. These findings indicate that PPO can produce more robust, precise control than fixed-gain controllers.

Keywords:

PPO, Reinforcement Learning, Drone, Attitude Control, Stability

View PDF

References

[1]. T. D. Company, “No Fear of Storms: New DJI M30 Enterprise Can Operate in Heavy Weather [Image].” 2025.

[2]. P. Gui, L. Tang, and S. C. Mukhopadhyay, “MEMS Based IMU for Tilting Measurement: Comparison of Complementary and Kalman Filter Based Data Fusion, ” in Proceedings of the 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA), Auckland, New Zealand, 2015, pp. 2004–2009. doi: 10.1109/ICIEA.2015.7334442.

[3]. P.-J. Bristeau, F. Callou, D. Vissière, and N. Petit, “The Navigation and Control Technology Inside the AR.Drone Micro UAV, ” in Preprints of the 18th IFAC World Congress, Milano, Italy, 2011, pp. 1477– 1484. [Online]. Available: https: //www.asprom.com/drone/PJB.pdf

[4]. W. Koch, R. Mancuso, R. West, and A. Bestavros, “Reinforcement Learning for UAV Attitude Control, ” ACM Transactions on Cyber-Physical Systems, vol. 3, no. 2, pp. 1–21, 2019, doi: 10.1145/3301273.

[5]. M. Okasha, J. Kralev, and M. Islam, “Design and Experimental Comparison of PID, LQR and MPC Stabilizing Controllers for Parrot Mambo Mini-Drone, ” Aerospace, vol. 9, no. 6, p. 298, 2022, doi: 10.3390/aerospace9060298.

[6]. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and OpenAI, “Proximal Policy Optimization Algorithms, ” 2017.

[7]. J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust Region Policy Optimization, ” arXiv preprint arXiv: 1502.05477, 2015, [Online]. Available: https: //arxiv.org/abs/1502.05477

[8]. W. Chen, K. K. L. Wong, S. Long, and Z. Sun, “Relative Entropy of Correct Proximal Policy Optimization Algorithms with Modified Penalty Factor in Complex Environment, ” Entropy, vol. 24, no. 4, p. 440, 2022, doi: 10.3390/e24040440.

[9]. J. Panerati, H. Zheng, S. Zhou, J. Xu, A. Prorok, and A. P. Schoellig, “Learning to Fly—a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control, ” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 7512–7519. doi: 10.1109/IROS51168.2021.9635857.

[10]. J. Peksa and D. Mamchur, “A Review on the State of the Art in Copter Drones and Flight Control Systems, ” Sensors, vol. 24, no. 11, p. 3349, 2024, doi: 10.3390/s24113349.

[11]. F. Santoso, M. A. Garratt, and S. G. Anavatti, “State-of-the-Art Intelligent Flight Control Systems in Unmanned Aerial Vehicles, ” IEEE Transactions on Automation Science and Engineering, vol. 15, no. 2, pp. 613–627, 2018, doi: 10.1109/TASE.2017.2651109.

[12]. A. Zulu and S. John, “A Review of Control Algorithms for Autonomous Quadrotors, ” Open Journal of Applied Sciences, vol. 4, no. 14, pp. 547–556, 2014, doi: 10.4236/ojapps.2014.414053.

References

[1]. T. D. Company, “No Fear of Storms: New DJI M30 Enterprise Can Operate in Heavy Weather [Image].” 2025.

[6]. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and OpenAI, “Proximal Policy Optimization Algorithms, ” 2017.

[10]. J. Peksa and D. Mamchur, “A Review on the State of the Art in Copter Drones and Flight Control Systems, ” Sensors, vol. 24, no. 11, p. 3349, 2024, doi: 10.3390/s24113349.

[12]. A. Zulu and S. John, “A Review of Control Algorithms for Autonomous Quadrotors, ” Open Journal of Applied Sciences, vol. 4, no. 14, pp. 547–556, 2014, doi: 10.4236/ojapps.2014.414053.

Cite this article

Wang,S. (2025). Optimization Application of PPO Algorithm in Reinforcement Learning in Drone Attitude Balance. Lecture Notes in Education Psychology and Public Media,125,1-16.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-CIAP 2026 Symposium: International Conference on Atomic Magnetometer and Applications

ISBN: 978-1-80590-407-6(Print) / 978-1-80590-408-3(Online)

Editor:

Conference website: https://www.confciap.org/hangzhou.html

Conference date: 16 November 2025

Series: Lecture Notes in Education Psychology and Public Media

Volume number: Vol.125

ISSN: 2753-7048(Print) / 2753-7056(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:

1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.

2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).