Proposes a reinforcement learning method based on memorizing and retrieving episodes of the learner's own experiences. The results of the computer simulation on a simple but typical non-Markovian environment is shown to clarify the performance. An instance-based reinforcement learning method previously proposed by Unemi (1992) is also based on the learner's experiences memorized without any modification. But it is applicable only to the Markovian domain where it is enough for the learner to acquire a reactive policy to achieve the optimal behavior. An episode-based method not only overcomes perceptual aliasing but also inherits the advantages of the instance-based method on flexibility for applicable domains.
Xiaoguang LiWanting JiJidong Huang