Abstract :
[en] Reinforcement Learning (RL) agents are highly sensitive to noise, particularly consecutive noisy states that destabilize training and can trigger catastrophic forgetting, a phenomenon inherent in real-world data as well. While uncertainty estimation has been widely explored for guiding exploration, its role in stabilizing value updates under noisy
conditions remains relatively underexplored. In this work, we introduce MASURE (Masksembles for Stable and Uncertainty- aware Reinforcement Learning Environments), a novel framework that integrates Masksembles-based epistemic uncertainty into Q learning. MASURE employs uncertainty-conscious value updates, leveraging the epistemic uncertainty to stabilize learning in noisy environments. We evaluate MASURE in both popular online RL benchmarks with sustained noise spanning consecutive states and in an offline real-world churn prediction task with inherently noisy features to test training stability. Across both settings, MASURE consistently improves stability and predictive performance, outperforming standard RL agents (DQN, BootstrapDQN) and state-of-the-art UE baselines (SunriseDQN, IVDQN). In noisy online benchmarks, MASURE achieves higher and more stable returns than IVDQN, while in the offline churn prediction task it attains the highest balanced accuracy (64.3%), surpassing DQN (63.5%), BootstrapDQN (63.8%), SunriseDQN (61.9%), and IVDQN (62.0%). Importantly, MASURE achieves these gains with significantly lower computational cost than deep ensembles, making it suitable for large-scale real world applications