The paper elaborates upon a prior proposal for a novelty detector based on an artificial neural network forecaster. In the former paper, the novelty-based motivational signal was used in place of more conventional techniques (such as the ε-greedy policy, or the softmax policy) to drive exploration, in the context of V-learning. The current paper provides a more comprehensive study of such handling of the exploration vs. exploitation trade-off. It also studies the various problems concerning application of the approach to SARSA, and Q-learning. Also, and with the same goal in mind, the paper presents several advances upon the original design.
Jing LiXinxin ShiJiehao LiXin ZhangJunzheng WangXin ZhangJunzheng Wang
Boyao LiTao LüJiayi LiNing LüYinghao CaiShuo Wang
Ming HuMin ZhangFrédéric MalletXin FuMingsong Chen
Ildefons Magrans de AbrilRyota Kanai