Reinforcement Learning Interpretability Methods and Decision Making Methods under Constraints
Research Article
Open Access
CC BY

Reinforcement Learning Interpretability Methods and Decision Making Methods under Constraints

Siheng Ye 1*
1 Leeds College, Southwest Jiaotong University, Chengdu, Sichuan, China
*Corresponding author: map1e@my.swjtu.edu.cn
Published on 14 October 2025
Journal Cover
ACE Vol.191
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-80590-184-6
ISBN (Online): 978-1-80590-129-7
Download Cover

Abstract

Reinforcement learning (RL), as a core technology of artificial intelligence, has shown strong potential in the fields of robotics, games and autonomous driving. However, the "black box" nature of deep RL models leads to a lack of transparency in the decision-making process, making it difficult for users to understand and trust the agent behavior of RL models, and the uninterpretability of decisions may cause serious consequences in sensitive fields such as healthcare and finance. At the same time, because traditional RL pursues maximum reward and result models often ignore fairness, leading to policy bias, which affects the group's rights. So this article will summarize from the perspective of two key transparency and fairness of RL as summarized in the paper: one is based on the interpretability of the decision-making method, using the causal analysis and partial interpretation and visualization tools to make decisions transparent; Second, the decision-making method based on the constraint conditions, through multi-objective optimization and gradually constraints ensure the decision unfair. This review covers the methodologies, experimental results and limitations of representative literature in recent years. The significance of this paper is to systematically integrate these methods, reveal the interaction challenges of transparency and fairness, promote the development of more reliable RL systems, and look forward to future directions to help promote the ethical deployment and sustainable innovation of RL in social applications.

Keywords:

Reinforcement learning, Interpretability, Decision making Introduction

View PDF
Ye,S. (2025). Reinforcement Learning Interpretability Methods and Decision Making Methods under Constraints. Applied and Computational Engineering,191,40-45.

References

[1]. Kurach, K. et al. (2020) Google Research Football: A Novel Reinforcement Learning Environment. Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4501-4510.

[2]. Madumal, P., Miller, T., Sonenberg, L. and Vetere, F. (2020) Explainable Reinforcement Learning Through a Causal Lens. Proceedings of the AAAI Conference on Artificial Intelligence, pp. 2493-2500.

[3]. Guo, W., Wu, X., Khan, U. and Xing, X. (2021) EDGE: Explaining Deep Reinforcement Learning Policies. Advances in Neural Information Processing Systems.

[4]. Siddique, U., Weng, P. and Zimmer, M. (2020) Learning Fair Policies in Multi-Objective (Deep) Reinforcement Learning with Average and Discounted Rewards. Proceedings of the 37th International Conference on Machine Learning.

[5]. Zimmer, M., Glanois, C., Siddique, U. and Weng, P. (2021) Learning Fair Policies in Decentralized Cooperative Multi-Agent Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning.

[6]. Deng, Z., Jiang, J., Long, G. and Zhang, C. (2024) What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning. Proceedings of the 33rd International Joint Conference on Artificial Intelligence, pp. 3908-3916.

[7]. Luss, R., Dhurandhar, A. and Miao, L. (2023) Local Explanations for Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9002-9010.

[8]. Soligo, A., Ferraro, P. and Boyle, D. (2025) Inducing, Detecting and Characterising Neural Modules: A Pipeline for Functional Interpretability in Reinforcement Learning. Proceedings of the 42nd International Conference on Machine Learning.

[9]. D'Amour, A. et al. (2022) Underspecification Presents Challenges for Credibility in Modern Machine Learning. Journal of Machine Learning Research, 23(226): 1-61.

[10]. Gohar, U. and Cheng, L. (2023) A Survey on Intersectional Fairness in Machine Learning: Notions, Mitigation, and Challenges. Proceedings of the 32nd International Joint Conference on Artificial Intelligence, pp. 6619-6627.

[11]. Jabbari, S., Joseph, M., Kearns, M., Morgenstern, J. and Roth, A. (2017) Fairness in Reinforcement Learning. Proceedings of the 34th International Conference on Machine Learning, pp. 1617-1626.

[12]. Boggess, K., Kraus, S. and Feng, L. (2023) Explainable Multi-Agent Reinforcement Learning for Temporal Queries. Proceedings of the International Joint Conference on Artificial Intelligence, pp. 55-63.

[13]. Finkelstein, M., Liu, L., Levy Schlot, N., Kolumbus, Y., Parkes, D.C., Rosenschein, J.S. and Keren, S. (2022) Explainable Reinforcement Learning via Model Transforms. Advances in Neural Information Processing Systems.

Cite this article

Ye,S. (2025). Reinforcement Learning Interpretability Methods and Decision Making Methods under Constraints. Applied and Computational Engineering,191,40-45.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-MLA 2025 Symposium: Intelligent Systems and Automation: AI Models, IoT, and Robotic Algorithms

ISBN: 978-1-80590-184-6(Print) / 978-1-80590-129-7(Online)
Editor: Hisham AbouGrad
Conference date: 17 November 2025
Series: Applied and Computational Engineering
Volume number: Vol.191
ISSN: 2755-2721(Print) / 2755-273X(Online)