Deep-Reinforcement-Learning-Based Autonomous UAV Navigation With Sparse Rewards

Chao Wang; Jian Wang; Jingjing Wang; Xudong Zhang

doi:10.1109/jiot.2020.2973193

ScienceGate Book Chapters

JOURNAL ARTICLE

Deep-Reinforcement-Learning-Based Autonomous UAV Navigation With Sparse Rewards

Chao Wang Jian Wang Jingjing Wang Xudong Zhang

Year: 2020 Journal: IEEE Internet of Things Journal Vol: 7 (7)Pages: 6180-6190 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/jiot.2020.2973193

Get Full-Text PDF Get Analytical Report

Abstract

Unmanned aerial vehicles (UAVs) have the potential in delivering Internet-of-Things (IoT) services from a great height, creating an airborne domain of the IoT. In this article, we address the problem of autonomous UAV navigation in large-scale complex environments by formulating it as a Markov decision process with sparse rewards and propose an algorithm named deep reinforcement learning (RL) with nonexpert helpers (LwH). In contrast to prior RL-based methods that put huge efforts into reward shaping, we adopt the sparse reward scheme, i.e., a UAV will be rewarded if and only if it completes navigation tasks. Using the sparse reward scheme ensures that the solution is not biased toward potentially suboptimal directions. However, having no intermediate rewards hinders the agent from efficient learning since informative states are rarely encountered. To handle the challenge, we assume that a prior policy (nonexpert helper) that might be of poor performance is available to the learning agent. The prior policy plays the role of guiding the agent in exploring the state space by reshaping the behavior policy used for environmental interaction. It also assists the agent in achieving goals by setting dynamic learning objectives with increasing difficulty. To evaluate our proposed method, we construct a simulator for UAV navigation in large-scale complex environments and compare our algorithm with several baselines. Experimental results demonstrate that LwH significantly outperforms the state-of-the-art algorithms handling sparse rewards and yields impressive navigation policies comparable to those learned in the environment with dense rewards.

Keywords:

Reinforcement learning Computer science Markov decision process Artificial intelligence Construct (python library) Scale (ratio) Process (computing) Machine learning Domain (mathematical analysis) State space Autonomous agent State (computer science) Scheme (mathematics) Markov process Algorithm

Metrics

182

Cited By

28.07

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

UAV Applications and Optimization

Physical Sciences → Engineering → Aerospace Engineering

Robotics and Sensor-Based Localization

Physical Sciences → Engineering → Aerospace Engineering

Distributed Control Multi-Agent Systems

Physical Sciences → Computer Science → Computer Networks and Communications

Deep-Reinforcement-Learning-Based Autonomous UAV Navigation With Sparse Rewards

Abstract

Metrics

Citation History

Topics

Related Documents

Multimodal fusion for autonomous navigation via deep reinforcement learning with sparse rewards and hindsight experience replay

Selective Imitation Enhanced Deep Reinforcement Learning for AAV Navigation and Obstacle Avoidance With Sparse Rewards

Intermittent Reinforcement Learning with Sparse Rewards

Autonomous Maneuver Decision-Making Through Curriculum Learning and Reinforcement Learning With Sparse Rewards

Autonomous UAV Navigation with Adaptive Control Based on Deep Reinforcement Learning