References
[1]. D. Hafner, J. Pasukonis, J. Ba, T. Lillicrap, Mastering diverse domains through world models, arXiv preprint arXiv: 2301.04104 (2023). DOI: https: //doi.org/10.48550/arXiv.2301.04104
[2]. D. Ha, J. Schmidhuber, World models, arXiv preprint arXiv: 1803.10122 (2018). DOI: https: //doi.org/10.48550/arXiv.1803.10122
[3]. D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, J. Davidson, Learning latent dynamics for planning from pixels, Proc. Mach. Learn. Res. 97 (2019) 2555–2565.
[4]. M. Laskin, A. Srinivas, P. Abbeel, CURL: Contrastive unsupervised representations for reinforcement learning, Proc. Mach. Learn. Res. 119 (2020) 5639–5650.
[5]. I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, A. Lerchner, beta-VAE: Learning basic visual concepts with a constrained variational framework, Int. Conf. Learn. Represent. (2017).
[6]. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth 16x16 words: Transformers for image recognition at scale, Int. Conf. Learn. Represent. (2021).
[7]. S.M.A. Eslami, D.J. Rezende, F. Besse, F. Viola, A.S. Morcos, M. Garnelo, A. Ruderman, A.A. Rusu, I. Danihelka, K. Gregor, D.P. Reichert, L. Buesing, T. Weber, O. Vinyals, D. Rosenbaum, N. Rabinowitz, H. King, C. Hillier, M. Botvinick, D. Wierstra, K. Kavukcuoglu, D. Hassabis, Neural scene representation and rendering, Science 360(6394) (2018) 1204–1210. DOI: https: //doi.org/10.1126/science.aar6170
[8]. A. Goyal, R. Islam, D. Strouse, Z. Ahmed, M. Botvinick, H. Larochelle, Y. Bengio, S. Levine, Transfer and exploration via the information bottleneck, Int. Conf. Learn. Represent. (2019).