Alessandro TrapassoAnders Jönsson
Coordinating and synchronizing multiple agents in reinforcement learning (RL) presents significant challenges, particularly when concurrent actions and shared objectives are required. We propose a novel framework that integrates Reward Machines (RMs) with Partial-Order Planning (POP) to enhance coordination in multiagent reinforcement learning (MARL). By transforming high-level POP strategies into individual RMs for each agent, our approach explicitly captures action dependencies and concurrency requirements, enabling agents to learn and execute coordinated plans effectively in complex environments. We validate our approach in a grid-based multiagent domain in which agents have to synchronize actions such as jointly accessing limited pathways or collaboratively manipulating objects. The explicit representation of action dependencies and synchronization points in RMs provides a scalable and flexible mechanism to model concurrent actions, enabling agents to focus on relevant tasks and reducing exploration.
Sophia SmithCyrus NearyUfuk Topcu
Jan CorazzaIvan GavranDaniel Neider
Nasim BaharisangariYash PaliwalZhe Xu
Jueming HuJean-Raphaël GaglioneYanze WangZhe XuUfuk TopcuYongming Liu
Giovanni VarricchioneToryn Q. KlassenNatasha AlechinaMehdi DastaniBrian LoganSheila A. McIlraith