In this paper, we study reinforcement learning (RL) with general function approximation, where either the value function or the model dynamics is approximated by a given abstract hypothesis space. We propose the generalized eluder coefficient (GEC), which measures the hardness of generalization from the historical in-sample error to the prediction error, and further serves to measure the hardness of learning an RL problem. In terms of the algorithmic design, we propose an optimization-based framework for RL with general function approximation, following the general principle of “Optimism in the Face of Uncertainty” (OFU). Compared to existing algorithms, the proposed framework does not explicitly maintain the confidence set, and neatly handles both model-free and model-based problems wi...[ Read more ]
Vaneet AggarwalWashim Uddin Mondal
Chi JinZhuoran YangZhaoran WangMichael I. Jordan
Long-Fei LiYu-Jie ZhangPeng ZhaoZhi-Hua Zhou
Haque IshfaqQiwen CuiViệt Dũng NguyễnAlex AyoubZhuoran YangZhaoran WangDoina PrecupLin F. Yang