Multi-objective reinforcement learning (MORL) for robot motion learning is a challenging problem not only because of the scarcity of the data but also of the high-dimensional and continuous state and action spaces. Most existing MORL algorithms are inadequate in this regard. However, in single-objective reinforcement learning, policy-based algorithms have solved the problem of high-dimensional and continuous state and action spaces. Among such algorithms is policy improvement with path integral (PI2), which has been successful in robot motion learning. P$\mathrm{I}^{2}$ is similar to evolution strategies (ES), and multi-objective optimization is a hot topic in ES algorithms. This paper proposes a MORL algorithm based on P$\mathrm{I}^{2}$ and multi-objective ES, which can handle the problem related to robot motion learning. The effectiveness is shown via numerical simulations.
Hayato SagoRyo AriizumiToru AsaiShun‐ichi Azuma
Hiroyasu NakanoRyo AriizumiToru AsaiShun‐ichi Azuma
Virgı́lio AlmeidaLucas N. AlegreAna L. C. Bazzan