Dynamic Saliency-Guided Representation Learning for World Models in Atari MsPacman
Research Article
Open Access
CC BY

Dynamic Saliency-Guided Representation Learning for World Models in Atari MsPacman

Hongxi Lyu 1*
1 School of Computer Science, University of Nottingham Ningbo China, Ningbo, 315100, China
*Corresponding author: ssyhl32@nottingham.edu.cn
Published on 28 October 2025
Journal Cover
ACE Vol.202
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-80590-497-7
ISBN (Online): 978-1-80590-498-4
Download Cover

Abstract

World Models, such as Dreamer, rely on latent representations to learn environment dynamics and facilitate planning. However, standard VAE-based encoders often capture redundant background features, resulting in inefficient training and slower convergence. In particular, reconstructions from vanilla VAEs tend to overlook dynamic elements---such as the player and ghosts in game scenes---prioritizing overall pixel fidelity over task-critical components. We introduce a Dynamic Saliency-Guided Encoder that incorporates a learnable attention mask to prioritize task-relevant regions in visual inputs. This encoder integrates seamlessly into a Dreamer-style architecture with a Recurrent State-Space Model (RSSM) and is optimized end-to-end with actor-critic updates. Experiments on the Atari MsPacman environment demonstrate that our method yields clearer reconstructions of salient elements, including maze walls, pellets, and the player character. Quantitative results show a 28% improvement in PSNR for task-critical entities and a 15% increase in average episodic rewards compared to the baseline Dreamer-V3 [1], indicating enhanced latent representation efficiency and sample efficiency in model-based reinforcement learning (MBRL). This work highlights the value of attention-enhanced encoders for scalable and semantically focused representation learning in MBRL.

Keywords:

Model-Based Reinforcement Learning, World Models, Variational Autoencoder, Dynamic Attention Mechanism.

View PDF
Lyu,H. (2025). Dynamic Saliency-Guided Representation Learning for World Models in Atari MsPacman. Applied and Computational Engineering,202,8-14.

References

[1]. D. Hafner, J. Pasukonis, J. Ba, T. Lillicrap, Mastering diverse domains through world models, arXiv preprint arXiv: 2301.04104 (2023). DOI:   https: //doi.org/10.48550/arXiv.2301.04104

[2]. D. Ha, J. Schmidhuber, World models, arXiv preprint arXiv: 1803.10122 (2018). DOI:   https: //doi.org/10.48550/arXiv.1803.10122

[3]. D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, J. Davidson, Learning latent dynamics for planning from pixels, Proc. Mach. Learn. Res. 97 (2019) 2555–2565.

[4]. M. Laskin, A. Srinivas, P. Abbeel, CURL: Contrastive unsupervised representations for reinforcement learning, Proc. Mach. Learn. Res. 119 (2020) 5639–5650.

[5]. I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, A. Lerchner, beta-VAE: Learning basic visual concepts with a constrained variational framework, Int. Conf. Learn. Represent. (2017).

[6]. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth 16x16 words: Transformers for image recognition at scale, Int. Conf. Learn. Represent. (2021).

[7]. S.M.A. Eslami, D.J. Rezende, F. Besse, F. Viola, A.S. Morcos, M. Garnelo, A. Ruderman, A.A. Rusu, I. Danihelka, K. Gregor, D.P. Reichert, L. Buesing, T. Weber, O. Vinyals, D. Rosenbaum, N. Rabinowitz, H. King, C. Hillier, M. Botvinick, D. Wierstra, K. Kavukcuoglu, D. Hassabis, Neural scene representation and rendering, Science 360(6394) (2018) 1204–1210. DOI:   https: //doi.org/10.1126/science.aar6170

[8]. A. Goyal, R. Islam, D. Strouse, Z. Ahmed, M. Botvinick, H. Larochelle, Y. Bengio, S. Levine, Transfer and exploration via the information bottleneck, Int. Conf. Learn. Represent. (2019).

Cite this article

Lyu,H. (2025). Dynamic Saliency-Guided Representation Learning for World Models in Atari MsPacman. Applied and Computational Engineering,202,8-14.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of CONF-MLA 2025 Symposium: Intelligent Systems and Automation: AI Models, IoT, and Robotic Algorithms

ISBN: 978-1-80590-497-7(Print) / 978-1-80590-498-4(Online)
Editor: Hisham AbouGrad
Conference date: 12 November 2025
Series: Applied and Computational Engineering
Volume number: Vol.202
ISSN: 2755-2721(Print) / 2755-273X(Online)