The developmental steps and paradigm changes in the use of automation technology using deep reinforcement learning (RL) are very rapid because they are also widely accompanied by the development of deep learning combination, which combines RL algorithms. One of the combinations is Q- learning algorithm with one of the deep learning algorithms family of artificial neural networks (ANN) and part of the artificial intelligence science. The combination also becomes a challenge for many researchers because so far it is very difficult to find the right combination in accordance with the case resolved although there are also those that combine with non- ANN. In addition, most RLs only use a single combination, which means that they have not found the ideal combination, whether it should be a single one of the algorithms of ANN or some of it. This study proposes a framework design using the Self-Organizing Map (SOM) algorithm that adaptively combines and plays as the actor to calculate the final Q-value value that is updated from a single or multiple Q-value values in a sustainable and dynamic manner. The result of the formed framework indicates that SOM is able to provide an adaptive combination for the algorithms that should be used in deep RL.
Yuxin LiQihao LiuXinyu LiLiang Gao
Minghong GengShubham PateriaBudhitama SubagdjaAh‐Hwee Tan
Yi LiuXiang WuYuming BoJiacun WangLifeng Ma