Yuchen XiaoJoshua HoffmanTian XiaChristopher Amato
We consider the challenges of learning multi-agent/robot macro-action-based deep Q-nets including how to properly update each macro-action value and accurately maintain macro-action-observation trajectories. We address these challenges by first proposing two fundamental frameworks for learning macro-action-value function and joint macro-action-value function. Furthermore, we present two new approaches of learning decentralized macro-action-based policies, which involve a new double Q-update rule that facilitates the learning of decentralized Q-nets by using a centralized Q-net for action selection. Our approaches are evaluated both in simulation and on real robots.
Aaron Hao TanFederico Pizarro BejaranoYuhan ZhuRichard RenGoldie Nejat
Elhadji Amadou Oury DialloToshiharu Sugawara
Hancheng ZhangGuozheng LiChi Harold LiuGuoren WangJian Tang