Comparative Study of Reinforcement Learning Performance Based on PPO and DQN Algorithms
Research Article
Open Access
CC BY

Comparative Study of Reinforcement Learning Performance Based on PPO and DQN Algorithms

Ce Tan 1*
1 University of Alberta, Edmonton, Alberta, Canada, T6G 2R3
*Corresponding author: Ceadamtan@gmail.com
Published on 11 July 2025
Journal Cover
ACE Vol.175
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-80590-237-9
ISBN (Online): 978-1-80590-238-6
Download Cover

Abstract

With the rapid development of artificial intelligence technology, reinforcement learning (RL) has emerged as a core research direction in the field of intelligent decision-making. Among numerous reinforcement learning algorithms, Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) have gained widespread attention due to their outstanding performance. These two algorithms have been extensively applied in areas such as autonomous driving and game AI, demonstrating strong adaptability and effectiveness. However, despite numerous application instances, systematic comparative studies on their specific performance differences remain relatively scarce. This study aims to systematically evaluate the differences between DQN and PPO algorithms across four performance metrics: convergence speed, stability, sample efficiency, and computational complexity. By combining theoretical analysis and experimental validation, we selected classic reinforcement learning environments—CartPole (for discrete action testing) and CarRacing (for continuous action evaluation)—to conduct a detailed performance assessment. The results show that DQN exhibits superior performance in discrete action environments with faster convergence and higher sample efficiency, whereas PPO demonstrates greater stability and adaptability in continuous action environments.

Keywords:

Reinforcement Learning, Proximal Policy Optimization, Deep Q-Network, Performance Comparison

View PDF
Tan,C. (2025). Comparative Study of Reinforcement Learning Performance Based on PPO and DQN Algorithms. Applied and Computational Engineering,175,30-36.

References

[1]. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.

[2]. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347.

[3]. Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22(268), 1-8.

[4]. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI Gym. arXiv preprint arXiv: 1606.01540.

[5]. Zhang, S., et al. (2023). On the Convergence and Sample Complexity Analysis of Deep Q-Networks with ε-Greedy Exploration. Advances in Neural Information Processing Systems (NeurIPS), 2023.

[6]. Son, S., Zheng, L., Sullivan, R., Qiao, Y.-L., & Lin, M. (2023). Gradient Informed Proximal Policy Optimization. Advances in Neural Information Processing Systems (NeurIPS), 2023.

Cite this article

Tan,C. (2025). Comparative Study of Reinforcement Learning Performance Based on PPO and DQN Algorithms. Applied and Computational Engineering,175,30-36.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-CDS 2025 Symposium: Application of Machine Learning in Engineering

ISBN: 978-1-80590-237-9(Print) / 978-1-80590-238-6(Online)
Editor: Marwan Omar, Mian Umer Shafiq
Conference website: https://www.confcds.org
Conference date: 19 August 2025
Series: Applied and Computational Engineering
Volume number: Vol.175
ISSN: 2755-2721(Print) / 2755-273X(Online)