Deep Reinforcement Learning algorithms have shown to perform well on complex tasks, such as video games and chess.However, when it comes to locomotive tasks, picking the right algorithm and hyperparameters continues to be a challenge for many researchers.This project addressed that issue by determining which one of three reinforcement learning algorithms worked most effectively to help a computer learn to walk, without any external supervision or guidance, in a simulated environment.In addition, the project also determined the best learning rate for the algorithms by testing out 6 learning rates.A walking environment was used as it is considered to be a good representative for a large class of reinforcement learning problems.Proximal policy optimization was found to be the most effective, followed by the trust-region policy optimization and the vanilla policy gradient.The algorithms worked best with learning rate 1e-3.
Ryohei SuzukiKohei KIDAMaho TomitaYoshito IKEMATAAkihito SANO
Athanasios MastrogeorgiouYehia S. ElbahrawyKonstantinos MachairasAndrés KecskeméthyEvangelos PapadopoulosA TaylorC PatrickG KevinF AlanJonathanG BledtC GehringM HutterJemin HwangboZ XieY DuanX ChenR HouthooftJ SchulmanP AbbeelLillicrapMohit SewakDavid SilverGuy LeverNicolas HeessThomas DegrisWierstra& DaanMartin RiedmillerK MachairasE PapadopoulosT HaarnojaA ZhouP AbbeelS Levine
Soheila GheisariAlireza Rezaee
Ayoub BelouadahMarcelo Luis Ruiz-RodríguezSylvain KublerYves Le Traon