Shoaib Mohd NastiZahoor Ahmad NajarMohammad Ahsan Chishti
Navigating in unknown environments without prior maps poses a significant challenge for mobile robots due to sparse rewards, dynamic obstacles, and limited prior knowledge. This paper presents an Improved Deep Reinforcement Learning (DRL) framework based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm for adaptive mapless navigation. In addition to architectural enhancements, the proposed method offers theoretical benefits byincorporates a latent-state encoder and predictor module to transform high-dimensional sensor inputs into compact embeddings. This compact representation reduces the effective dimensionality of the state space, enabling smoother value-function approximation and mitigating overestimation errors common in actor–critic methods. It uses intrinsic rewards derived from prediction error in the latent space to promote exploration of novel states. The intrinsic reward encourages the agent to prioritize uncertain yet informative regions, improving exploration efficiency under sparse extrinsic reward signals and accelerating convergence. Furthermore, training stability is achieved through regularization of the latent space via maximum mean discrepancy (MMD) loss. By enforcing consistent latent dynamics, the MMD constraint reduces variance in target value estimation and results in more stable policy updates. Experimental results in simulated ROS2/Gazebo environments demonstrate that the proposed framework outperforms standard TD3 and other improved TD3 variants. Our model achieves a 93.1% success rate and a low 6.8% collision rate, reflecting efficient and safe goal-directed navigation. These findings confirm that combining intrinsic motivation, structured representation learning, and regularization-based stabilization produces more robust and generalizable policies for mapless mobile robot navigation.
Wenxing LiuHanlin NiuIpek CaliskanelliZhengjia XuRobert Skilton
Nanxun DuoQinzhao WangQiang LvHeng WeiPei Zhang