Hend AlTairTarek TahaJorge DiasMahmoud Al‐Qutayri
In this paper we introduce a novel rewarding scheme for the classical POMDP formulation. The proposed scheme aims to reinforce preference of objectives. It ensures that a high-priority preferences get high accumulative rewards, solve ambiguities and it can be conducted before the low-priority-preference. In order to show conflicting of objectives, context of search and rescue has been selected for this paper. It involves heterogeneous team with potential conflicting multiobjective situations. Our rewarding scheme has been tested in simulated scenarios using multiple POMDP solvers. Results obtained from the simulated experiments show that the system is able to represent an impact on the human (first responder) through enforcing priority of objectives and iteratively optimize policies to better suit the first responder and the search and rescue mission goals.
Olesia B. MalaschukAlexander A. Dyumin
Diederik M. RoijersShimon Whiteson
Shashi ShekharHui XiongXun Zhou