Hayato SagoRyo AriizumiToru AsaiShun‐ichi Azuma
Abstract This paper proposes a new multi-objective reinforcement learning (MORL) algorithm for robotics by extending policy improvement with path integral ( $$\text {PI}^2$$ PI 2 ) algorithm. For a robot motion acquisition problem, most existing MORL algorithms are hard to apply, because of the high-dimensional and continuous state and action spaces. However, policy-based algorithms such as $$\text {PI}^2$$ PI 2 can be applied to solve this problem in single-objective cases. Based on the similarity of $$\text {PI}^2$$ PI 2 and evolution strategies (ESs) and the fact that ESs are well-suited for multi-objective optimization, we propose an extension of $$\text {PI}^2$$ PI 2 and some techniques to speed up the learning. The effectiveness is shown via numerical simulations.
Ryo AriizumiHayato SagoToru AsaiShun‐ichi Azuma
Péter VárnaiDimos V. Dimarogonas
Varnai, PeterDimarogonas, Dimos V
Kosuke YamamotoRyo AriizumiTomohiro HayakawaFumitoshi Matsuno
Hiroyasu NakanoRyo AriizumiToru AsaiShun‐ichi Azuma