Dynamic Saliency-Guided Representation Learning for World Models in Atari MsPacman

Hongxi Lyu

doi:10.54254/2755-2721/2025.LD28612

Applied and Computational EngineeringOpen access

Dynamic Saliency-Guided Representation Learning for World Models in Atari MsPacman

Research Article

Open Access

Dynamic Saliency-Guided Representation Learning for World Models in Atari MsPacman

Hongxi Lyu ^1*

¹ School of Computer Science, University of Nottingham Ningbo China, Ningbo, 315100, China

^*Corresponding author: ssyhl32@nottingham.edu.cn

Published on 28 October 2025

ACE Vol.202

ISSN (Print): 2755-273X

ISSN (Online): 2755-2721

ISBN (Print): 978-1-80590-497-7

ISBN (Online): 978-1-80590-498-4

Download Cover

Abstract

World Models, such as Dreamer, rely on latent representations to learn environment dynamics and facilitate planning. However, standard VAE-based encoders often capture redundant background features, resulting in inefficient training and slower convergence. In particular, reconstructions from vanilla VAEs tend to overlook dynamic elements---such as the player and ghosts in game scenes---prioritizing overall pixel fidelity over task-critical components. We introduce a Dynamic Saliency-Guided Encoder that incorporates a learnable attention mask to prioritize task-relevant regions in visual inputs. This encoder integrates seamlessly into a Dreamer-style architecture with a Recurrent State-Space Model (RSSM) and is optimized end-to-end with actor-critic updates. Experiments on the Atari MsPacman environment demonstrate that our method yields clearer reconstructions of salient elements, including maze walls, pellets, and the player character. Quantitative results show a 28% improvement in PSNR for task-critical entities and a 15% increase in average episodic rewards compared to the baseline Dreamer-V3 [1], indicating enhanced latent representation efficiency and sample efficiency in model-based reinforcement learning (MBRL). This work highlights the value of attention-enhanced encoders for scalable and semantically focused representation learning in MBRL.

Keywords:

Model-Based Reinforcement Learning, World Models, Variational Autoencoder, Dynamic Attention Mechanism.

View PDF

References

[1]. D. Hafner, J. Pasukonis, J. Ba, T. Lillicrap, Mastering diverse domains through world models, arXiv preprint arXiv: 2301.04104 (2023). DOI: https: //doi.org/10.48550/arXiv.2301.04104

[2]. D. Ha, J. Schmidhuber, World models, arXiv preprint arXiv: 1803.10122 (2018). DOI: https: //doi.org/10.48550/arXiv.1803.10122

[3]. D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, J. Davidson, Learning latent dynamics for planning from pixels, Proc. Mach. Learn. Res. 97 (2019) 2555–2565.

[4]. M. Laskin, A. Srinivas, P. Abbeel, CURL: Contrastive unsupervised representations for reinforcement learning, Proc. Mach. Learn. Res. 119 (2020) 5639–5650.

[5]. I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, A. Lerchner, beta-VAE: Learning basic visual concepts with a constrained variational framework, Int. Conf. Learn. Represent. (2017).

[6]. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth 16x16 words: Transformers for image recognition at scale, Int. Conf. Learn. Represent. (2021).

[7]. S.M.A. Eslami, D.J. Rezende, F. Besse, F. Viola, A.S. Morcos, M. Garnelo, A. Ruderman, A.A. Rusu, I. Danihelka, K. Gregor, D.P. Reichert, L. Buesing, T. Weber, O. Vinyals, D. Rosenbaum, N. Rabinowitz, H. King, C. Hillier, M. Botvinick, D. Wierstra, K. Kavukcuoglu, D. Hassabis, Neural scene representation and rendering, Science 360(6394) (2018) 1204–1210. DOI: https: //doi.org/10.1126/science.aar6170

[8]. A. Goyal, R. Islam, D. Strouse, Z. Ahmed, M. Botvinick, H. Larochelle, Y. Bengio, S. Levine, Transfer and exploration via the information bottleneck, Int. Conf. Learn. Represent. (2019).

References

[1]. D. Hafner, J. Pasukonis, J. Ba, T. Lillicrap, Mastering diverse domains through world models, arXiv preprint arXiv: 2301.04104 (2023). DOI: https: //doi.org/10.48550/arXiv.2301.04104

[2]. D. Ha, J. Schmidhuber, World models, arXiv preprint arXiv: 1803.10122 (2018). DOI: https: //doi.org/10.48550/arXiv.1803.10122

[3]. D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, J. Davidson, Learning latent dynamics for planning from pixels, Proc. Mach. Learn. Res. 97 (2019) 2555–2565.

[4]. M. Laskin, A. Srinivas, P. Abbeel, CURL: Contrastive unsupervised representations for reinforcement learning, Proc. Mach. Learn. Res. 119 (2020) 5639–5650.

[8]. A. Goyal, R. Islam, D. Strouse, Z. Ahmed, M. Botvinick, H. Larochelle, Y. Bengio, S. Levine, Transfer and exploration via the information bottleneck, Int. Conf. Learn. Represent. (2019).

Cite this article

Lyu,H. (2025). Dynamic Saliency-Guided Representation Learning for World Models in Atari MsPacman. Applied and Computational Engineering,202,8-14.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-MLA 2025 Symposium: Intelligent Systems and Automation: AI Models, IoT, and Robotic Algorithms

ISBN: 978-1-80590-497-7(Print) / 978-1-80590-498-4(Online)

Editor: Hisham AbouGrad

Conference website: https://www.confmla.org/london.html

Conference date: 12 November 2025

Series: Applied and Computational Engineering

Volume number: Vol.202

ISSN: 2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:

1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.

2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).