Path Finding is a vastly studied subject in the field of Computer Science. The problem of path-finding is defined as the discovery and plotting of an optimal route between two points on a plane. The existing algorithms that solve this problem are mostly static and rely heavily on the prior knowledge of the environment. They also require the environment to be deterministic. However, in real-world applications of the path-finding problem, often the environment is priorly unknown and stochastic, and with several conflicting objectives. In such cases, the aforementioned algorithms fail to produce effective results. In this project, we study and use a reinforcement learning approach for solving the many-objective path-finding problem, called Voting Q-Learning (VoQL), a model-free, on-policy learning algorithm. In this project, a set of optimal policies is determined with the help of the VoQL algorithm. This algorithm uses various voting methods borrowed from the field of social choice theory for action-selection. In addition to working with the existing methods for VOQL, the performance of additional voting methods is studied and evaluated for the first time.
Bentz TozerThomas A. MazzuchiShahram Sarkani
Lin ChenYaonan WangZhiqiang MiaoYang MoMingtao FengZhen Zhou
Jian XuFei HuangYunfei CuiXue Du
Yuanhao WangWeihao XiaKunpeng OuyangJianxin LinHuachuan WangGuo Zhang
Ryo AriizumiHayato SagoToru AsaiShun‐ichi Azuma