Reinforcement Learning Interpretability Methods and Decision Making Methods under Constraints

Siheng Ye

doi:10.54254/2755-2721/2025.LD27834

Applied and Computational EngineeringOpen access

Reinforcement Learning Interpretability Methods and Decision Making Methods under Constraints

Research Article

Open Access

Reinforcement Learning Interpretability Methods and Decision Making Methods under Constraints

Siheng Ye ^1*

¹ Leeds College, Southwest Jiaotong University, Chengdu, Sichuan, China

^*Corresponding author: map1e@my.swjtu.edu.cn

Published on 14 October 2025

ACE Vol.191

ISSN (Print): 2755-273X

ISSN (Online): 2755-2721

ISBN (Print): 978-1-80590-184-6

ISBN (Online): 978-1-80590-129-7

Download Cover

Abstract

Reinforcement learning (RL), as a core technology of artificial intelligence, has shown strong potential in the fields of robotics, games and autonomous driving. However, the "black box" nature of deep RL models leads to a lack of transparency in the decision-making process, making it difficult for users to understand and trust the agent behavior of RL models, and the uninterpretability of decisions may cause serious consequences in sensitive fields such as healthcare and finance. At the same time, because traditional RL pursues maximum reward and result models often ignore fairness, leading to policy bias, which affects the group's rights. So this article will summarize from the perspective of two key transparency and fairness of RL as summarized in the paper: one is based on the interpretability of the decision-making method, using the causal analysis and partial interpretation and visualization tools to make decisions transparent; Second, the decision-making method based on the constraint conditions, through multi-objective optimization and gradually constraints ensure the decision unfair. This review covers the methodologies, experimental results and limitations of representative literature in recent years. The significance of this paper is to systematically integrate these methods, reveal the interaction challenges of transparency and fairness, promote the development of more reliable RL systems, and look forward to future directions to help promote the ethical deployment and sustainable innovation of RL in social applications.

Keywords:

Reinforcement learning, Interpretability, Decision making Introduction

View PDF

References

[1]. Kurach, K. et al. (2020) Google Research Football: A Novel Reinforcement Learning Environment. Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4501-4510.

[2]. Madumal, P., Miller, T., Sonenberg, L. and Vetere, F. (2020) Explainable Reinforcement Learning Through a Causal Lens. Proceedings of the AAAI Conference on Artificial Intelligence, pp. 2493-2500.

[3]. Guo, W., Wu, X., Khan, U. and Xing, X. (2021) EDGE: Explaining Deep Reinforcement Learning Policies. Advances in Neural Information Processing Systems.

[4]. Siddique, U., Weng, P. and Zimmer, M. (2020) Learning Fair Policies in Multi-Objective (Deep) Reinforcement Learning with Average and Discounted Rewards. Proceedings of the 37th International Conference on Machine Learning.

[5]. Zimmer, M., Glanois, C., Siddique, U. and Weng, P. (2021) Learning Fair Policies in Decentralized Cooperative Multi-Agent Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning.

[6]. Deng, Z., Jiang, J., Long, G. and Zhang, C. (2024) What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning. Proceedings of the 33rd International Joint Conference on Artificial Intelligence, pp. 3908-3916.

[7]. Luss, R., Dhurandhar, A. and Miao, L. (2023) Local Explanations for Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9002-9010.

[8]. Soligo, A., Ferraro, P. and Boyle, D. (2025) Inducing, Detecting and Characterising Neural Modules: A Pipeline for Functional Interpretability in Reinforcement Learning. Proceedings of the 42nd International Conference on Machine Learning.

[9]. D'Amour, A. et al. (2022) Underspecification Presents Challenges for Credibility in Modern Machine Learning. Journal of Machine Learning Research, 23(226): 1-61.

[10]. Gohar, U. and Cheng, L. (2023) A Survey on Intersectional Fairness in Machine Learning: Notions, Mitigation, and Challenges. Proceedings of the 32nd International Joint Conference on Artificial Intelligence, pp. 6619-6627.

[11]. Jabbari, S., Joseph, M., Kearns, M., Morgenstern, J. and Roth, A. (2017) Fairness in Reinforcement Learning. Proceedings of the 34th International Conference on Machine Learning, pp. 1617-1626.

[12]. Boggess, K., Kraus, S. and Feng, L. (2023) Explainable Multi-Agent Reinforcement Learning for Temporal Queries. Proceedings of the International Joint Conference on Artificial Intelligence, pp. 55-63.

[13]. Finkelstein, M., Liu, L., Levy Schlot, N., Kolumbus, Y., Parkes, D.C., Rosenschein, J.S. and Keren, S. (2022) Explainable Reinforcement Learning via Model Transforms. Advances in Neural Information Processing Systems.

References

[1]. Kurach, K. et al. (2020) Google Research Football: A Novel Reinforcement Learning Environment. Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4501-4510.

[3]. Guo, W., Wu, X., Khan, U. and Xing, X. (2021) EDGE: Explaining Deep Reinforcement Learning Policies. Advances in Neural Information Processing Systems.

[7]. Luss, R., Dhurandhar, A. and Miao, L. (2023) Local Explanations for Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9002-9010.

[9]. D'Amour, A. et al. (2022) Underspecification Presents Challenges for Credibility in Modern Machine Learning. Journal of Machine Learning Research, 23(226): 1-61.

[11]. Jabbari, S., Joseph, M., Kearns, M., Morgenstern, J. and Roth, A. (2017) Fairness in Reinforcement Learning. Proceedings of the 34th International Conference on Machine Learning, pp. 1617-1626.

Cite this article

Ye,S. (2025). Reinforcement Learning Interpretability Methods and Decision Making Methods under Constraints. Applied and Computational Engineering,191,40-45.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-MLA 2025 Symposium: Intelligent Systems and Automation: AI Models, IoT, and Robotic Algorithms

ISBN: 978-1-80590-184-6(Print) / 978-1-80590-129-7(Online)

Editor: Hisham AbouGrad

Conference website: https://www.confmla.org/london.html

Conference date: 17 November 2025

Series: Applied and Computational Engineering

Volume number: Vol.191

ISSN: 2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:

1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.

2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).