Uncertainty-Aware Stochastic Hybrid World Models with Neural Map Memory for Autonomous Navigation in Partially Observable Grid Environments
DOI:
https://doi.org/10.66279/t95nh857Keywords:
Reinforcement Learning, World Models, Neural Map Memory, Autonomous Navigation, Stochastic ModelsAbstract
Autonomous navigation in partially observable environments remains a significant challenge for reinforcement learning agents due to incomplete observations, stochastic dynamics, and uncertainty in spatial perception. World models are robust at learning how the environment changes over time, and neural map architectures are competitive at representing spatial memory. However, most current methods treat these parts separately and don't often include explicit uncertainty estimation, thereby reducing navigation reliability and exploration efficiency. This paper introduces SHWM-NM (Stochastic Hybrid World Model with Uncertainty-Aware Neural Map Memory), a unified framework that integrates stochastic latent dynamics modeling, structured neural map memory, and multi-level uncertainty estimation to enhance autonomous navigation capabilities. The proposed architecture combines a stochastic hybrid world model with an uncertainty-aware neural map that explicitly represents spatial information and associated uncertainty. A policy learning module then employs these estimates to help with exploration and decision-making when not all information is available. When tested on MiniGrid-based navigation tasks, SHWM-NM significantly outperforms deterministic world model baselines. The proposed framework increases the average reward from 1.55 to 4.61, raises the success rate from 15% to 46%, and reduces the average trajectory length from 42.5 to 30.7 steps. Also, epistemic uncertainty decreased from 0.0073 to 0.0017 during training, reflecting a modest but consistent improvement in model confidence. During training, which means indicating improved modeling of environment dynamics. These results show that modeling stochastic dynamics, spatial memory, and uncertainty together in a single architecture demonstrates strong performance. This is a promising approach to making promising decisions and getting around in environments that aren't fully observable.
Downloads
References
[1] M. Al-Sharman, “Autonomous Driving at Unsignalized Intersections: A Review of Decision-Making Challenges and Reinforcement Learning-Based Solutions,” IEEE Transactions on Automation Science and Engineering, 2025, doi: 10.1109/TASE.2025.3646982. DOI: https://doi.org/10.1109/TASE.2025.3646982
[2] J. Ding, “Understanding World or Predicting Future? A Comprehensive Survey of World Models,” ACM Computing Surveys, vol. 58, no. 3, Sep. 2025, doi: 10.1145/3746449. DOI: https://doi.org/10.1145/3746449
[3] A. Khaleel and Á. Ballagi, “Reinforcement Learning for Lane-Changing Decision Making in Autonomous Vehicles: A Survey,” Smart Cities, vol. 9, no. 1, p. 9, Jan. 2026, doi: 10.3390/smartcities9010009. DOI: https://doi.org/10.3390/smartcities9010009
[4] E. Y. Walker, “Studying the Neural Representations of Uncertainty,” Nature Neuroscience, vol. 26, no. 11, pp. 1857–1867, Oct. 2023, doi: 10.1038/s41593-023-01444-y. DOI: https://doi.org/10.1038/s41593-023-01444-y
[5] L. Wijayathunga, A. Rassau, and D. Chai, “Challenges and Solutions for Autonomous Ground Robot Scene Understanding and Navigation in Unstructured Outdoor Environments: A Review,” Applied Sciences, vol. 13, no. 17, p. 9877, Aug. 2023, doi: 10.3390/app13179877. DOI: https://doi.org/10.3390/app13179877
[6] I. Chadès, L. V. Pascal, S. Nicol, C. S. Fletcher, and J. Ferrer-Mestres, “A Primer on Partially Observable Markov Decision Processes (POMDPs),” Methods in Ecology and Evolution, vol. 12, no. 11, pp. 2058–2072, Nov. 2021, doi: 10.1111/2041-210X.13692. DOI: https://doi.org/10.1111/2041-210X.13692
[7] G. Pezzulo, “Neural Representation in Active Inference: Using Generative Models to Interact With—and Understand—the Lived World,” Annals of the New York Academy of Sciences, vol. 1534, no. 1, pp. 45–68, Apr. 2024, doi: 10.1111/nyas.15118. DOI: https://doi.org/10.1111/nyas.15118
[8] A. K. Mackay, L. Riazuelo, and L. Montano, “RL-DOVS: Reinforcement Learning for Autonomous Robot Navigation in Dynamic Environments,” Sensors, vol. 22, no. 10, p. 3847, May 2022, doi: 10.3390/s22103847. DOI: https://doi.org/10.3390/s22103847
[9] V. Bhatia, S. Jain, K. Garg, and R. Mitra, “Performance Analysis of RKHS-Based Detectors for Nonlinear NLOS Ultraviolet Communications,” IEEE Transactions on Vehicular Technology, vol. 70, no. 4, pp. 3625–3639, Apr. 2021, doi: 10.1109/TVT.2021.3067236. DOI: https://doi.org/10.1109/TVT.2021.3067236
[10] H. Nguyen, R. Andersen, E. Boukas, and K. Alexis, “Uncertainty-Aware Visually Attentive Navigation Using Deep Neural Networks,” The International Journal of Robotics Research, vol. 43, no. 6, pp. 840–872, May 2024, doi: 10.1177/02783649231218720. DOI: https://doi.org/10.1177/02783649231218720
[11] V. Malathi, “Decision-Making for Path Planning of Mobile Robots Under Uncertainty: A Review of Belief-Space Planning Simplifications,” Robotics, vol. 14, no. 9, p. 127, Sep. 2025, doi: 10.3390/robotics14090127. DOI: https://doi.org/10.3390/robotics14090127
[12] J. Gao, “Uncertainty-Aware Gaussian Map for Vision-Language Navigation.”
[13] É. Pairet, J. D. Hernández, M. Carreras, Y. Petillot, and M. Lahijanian, “Online Mapping and Motion Planning Under Uncertainty for Safe Navigation in Unknown Environments,” IEEE Transactions on Automation Science and Engineering, vol. 19, no. 4, pp. 3356–3378, Oct. 2022, doi: 10.1109/TASE.2021.3118737. DOI: https://doi.org/10.1109/TASE.2021.3118737
[14] R. Liu, J. Huang, B. Lu, and W. Ding, “Certified Neural Network Control Architectures: Methodological Advances in Stability, Robustness, and Cross-Domain Applications,” Mathematics, vol. 13, no. 10, p. 1677, May 2025, doi: 10.3390/math13101677. DOI: https://doi.org/10.3390/math13101677
[15] Y. Matsuo, “Deep Learning, Reinforcement Learning, and World Models,” Neural Networks, vol. 152, pp. 267–275, Aug. 2022, doi: 10.1016/j.neunet.2022.03.037. DOI: https://doi.org/10.1016/j.neunet.2022.03.037
[16] S. Jin, X. Wang, and Q. Meng, “Spatial Memory-Augmented Visual Navigation Based on Hierarchical Deep Reinforcement Learning in Unknown Environments,” Knowledge-Based Systems, vol. 285, p. 111358, Feb. 2024, doi: 10.1016/j.knosys.2023.111358. DOI: https://doi.org/10.1016/j.knosys.2023.111358
[17] S. Brotee “A Survey on Joint Embedding Predictive Architectures and World Models,” ACM Comput. Surv., vol. 1, 2025, doi: TBD. DOI: https://doi.org/10.2139/ssrn.5772122
[18] L. Wang, Z. Luo, and L. Gao, “Stochastic Computing Architectures: Modeling, Optimization, and Applications,” Symmetry 2024, Vol. 16, Page 1701, vol. 16, no. 12, p. 1701, Dec. 2024, doi: 10.3390/sym16121701. DOI: https://doi.org/10.3390/sym16121701
[19] C. Jiang, J. Zheng, and X. Han, “Probability-interval hybrid uncertainty analysis for structures with both aleatory and epistemic uncertainties: a review,” Structural and Multidisciplinary Optimization, vol. 57, no. 6, pp. 2485–2502, Jun. 2018, doi: 10.1007/s00158-017-1864-4. DOI: https://doi.org/10.1007/s00158-017-1864-4
[20] A. Ganguly, S. Jain, and U. Watchareeruetai, “Amortized Variational Inference: A Systematic Review,” Journal of Artificial Intelligence Research, vol. 78, pp. 167–215, Oct. 2023, doi: 10.1613/jair.1.14258. DOI: https://doi.org/10.1613/jair.1.14258
[21] S. Aston, M. Nardini, and U. Beierholm, “Different types of uncertainty in multisensory perceptual decision making,” Philosophical Transactions of the Royal Society B: Biological Sciences, vol. 378, no. 1886, Sep. 2023, doi: 10.1098/rstb.2022.0349. DOI: https://doi.org/10.1098/rstb.2022.0349
[22] M. Leutbecher “Stochastic representations of model uncertainties at ECMWF: state of the art and future vision,” Quarterly Journal of the Royal Meteorological Society, vol. 143, no. 707, pp. 2315–2339, Jul. 2017, doi: 10.1002/qj.3094. DOI: https://doi.org/10.1002/qj.3094
[23] W. S. Parker, “Ensemble modeling, uncertainty and robust predictions,” Wiley Interdiscip. Rev. Clim. Change, vol. 4, no. 3, pp. 213–223, May 2013, doi: 10.1002/wcc.220. DOI: https://doi.org/10.1002/wcc.220
[24] A. Silwal , “A Comprehensive Review of Machine Learning and Deep Learning Methods for Flood Inundation Mapping,” Earth 2026, Vol. 7, Page 44, vol. 7, no. 2, p. 44, Mar. 2026, doi: 10.3390/earth7020044. DOI: https://doi.org/10.3390/earth7020044
[25] L. Luo and J. G. Flanagan, “Development of continuous and discrete neural maps.,” Neuron, vol. 56, no. 2, pp. 284–300, Oct. 2007, doi: 10.1016/j.neuron.2007.10.014. DOI: https://doi.org/10.1016/j.neuron.2007.10.014
[26] A. Francis, S. Li, C. Griffiths, and J. Sienz, “Gas source localization and mapping with mobile robots: A review,” J. Field Robot., vol. 39, no. 8, pp. 1341–1373, Dec. 2022, doi: 10.1002/rob.22109. DOI: https://doi.org/10.1002/rob.22109
[27] D. Soydaner, “Attention mechanism in neural networks: where it comes and where it goes,” Neural Computing and Applications 2022 34:16, vol. 34, no. 16, pp. 13371–13385, May 2022, doi: 10.1007/s00521-022-07366-3. DOI: https://doi.org/10.1007/s00521-022-07366-3
[28] X. Dong, D. Thanou, L. Toni, M. Bronstein, and P. Frossard, “Graph Signal Processing for Machine Learning: A Review and New Perspectives,” IEEE Signal Process. Mag., vol. 37, no. 6, pp. 117–127, Nov. 2020, doi: 10.1109/MSP.2020.3014591. DOI: https://doi.org/10.1109/MSP.2020.3014591
[29] M. Pandey , “The transformational role of GPU computing and deep learning in drug discovery,” Nature Machine Intelligence 2022 4:3, vol. 4, no. 3, pp. 211–221, Mar. 2022, doi: 10.1038/s42256-022-00463-x. DOI: https://doi.org/10.1038/s42256-022-00463-x
Downloads
Published
Data Availability Statement
The data used in this study are available from the corresponding author upon reasonable request.
Issue
Section
Categories
License
Copyright (c) 2026 Computational Discovery and Intelligent Systems (CDIS)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Computational Discovery and Intelligent Systems (CDIS) content is published under a Creative Commons Attribution License (CCBY). This means that content is freely available to all readers upon publication, and content is published as soon as production is complete.
Computational Discovery and Intelligent Systems (CDIS) seeks to publish the most influential papers that will significantly advance scientific understanding. Selected articles must present new and widely significant data, syntheses, or concepts. They should merit recognition by the wider scientific community and the general public through publication in a reputable scientific journal.









