JOURNAL ARTICLE

Deep-Reinforcement-Learning-Based Autonomous UAV Navigation With Sparse Rewards

Chao WangJian WangJingjing WangXudong Zhang

Year: 2020 Journal:   IEEE Internet of Things Journal Vol: 7 (7)Pages: 6180-6190   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Unmanned aerial vehicles (UAVs) have the potential in delivering Internet-of-Things (IoT) services from a great height, creating an airborne domain of the IoT. In this article, we address the problem of autonomous UAV navigation in large-scale complex environments by formulating it as a Markov decision process with sparse rewards and propose an algorithm named deep reinforcement learning (RL) with nonexpert helpers (LwH). In contrast to prior RL-based methods that put huge efforts into reward shaping, we adopt the sparse reward scheme, i.e., a UAV will be rewarded if and only if it completes navigation tasks. Using the sparse reward scheme ensures that the solution is not biased toward potentially suboptimal directions. However, having no intermediate rewards hinders the agent from efficient learning since informative states are rarely encountered. To handle the challenge, we assume that a prior policy (nonexpert helper) that might be of poor performance is available to the learning agent. The prior policy plays the role of guiding the agent in exploring the state space by reshaping the behavior policy used for environmental interaction. It also assists the agent in achieving goals by setting dynamic learning objectives with increasing difficulty. To evaluate our proposed method, we construct a simulator for UAV navigation in large-scale complex environments and compare our algorithm with several baselines. Experimental results demonstrate that LwH significantly outperforms the state-of-the-art algorithms handling sparse rewards and yields impressive navigation policies comparable to those learned in the environment with dense rewards.

Keywords:
Reinforcement learning Computer science Markov decision process Artificial intelligence Construct (python library) Scale (ratio) Process (computing) Machine learning Domain (mathematical analysis) State space Autonomous agent State (computer science) Scheme (mathematics) Markov process Algorithm

Metrics

182
Cited By
28.07
FWCI (Field Weighted Citation Impact)
64
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

UAV Applications and Optimization
Physical Sciences →  Engineering →  Aerospace Engineering
Robotics and Sensor-Based Localization
Physical Sciences →  Engineering →  Aerospace Engineering
Distributed Control Multi-Agent Systems
Physical Sciences →  Computer Science →  Computer Networks and Communications
© 2026 ScienceGate Book Chapters — All rights reserved.