JOURNAL ARTICLE

Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation

Kai WangZhene ZouQilin DengJianrong TaoRunze WuChangjie FanLiang ChenPeng Cui

Year: 2021 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 35 (5)Pages: 4427-4435   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

In recent years, there are great interests as well as many challenges in applying reinforcement learning (RL) to recommendation systems (RS). In this paper, we summarize three key practical challenges of large-scale RL-based recommender systems: massive state and action spaces, high-variance environment, and the unspecific reward setting in recommendation. All these problems remain largely unexplored in the existing literature and make the application of RL challenging. We develop a model-based reinforcement learning framework, called GoalRec. Inspired by the ideas of world model (model-based), value function estimation (model-free), and goal-based RL, a novel disentangled universal value function designed for item recommendation is proposed. It can generalize to various goals that the recommender may have, and disentangle the stochastic environmental dynamics and high-variance reward signals accordingly. As a part of the value function, free from the sparse and high-variance reward signals, a high-capacity reward-independent world model is trained to simulate complex environmental dynamics under a certain goal. Based on the predicted environmental dynamics, the disentangled universal value function is related to the user's future trajectory instead of a monolithic state and a scalar reward. We demonstrate the superiority of GoalRec over previous approaches in terms of the above three practical challenges in a series of simulations and a real application.

Keywords:
Reinforcement learning Computer science Recommender system Variance (accounting) Artificial intelligence Bellman equation Function (biology) Machine learning Value (mathematics) Key (lock) Mathematical optimization Mathematics

Metrics

22
Cited By
5.73
FWCI (Field Weighted Citation Impact)
61
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Bandit Algorithms Research
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Smart Grid Energy Management
Physical Sciences →  Engineering →  Electrical and Electronic Engineering

Related Documents

JOURNAL ARTICLE

Denoising Item Graph With Disentangled Learning for Recommendation

Liang ZhangGuannan LiuXiaohui LiuJunjie Wu

Journal:   IEEE Transactions on Knowledge and Data Engineering Year: 2024 Vol: 36 (7)Pages: 2942-2955
JOURNAL ARTICLE

Disentangled Graph Contrastive Learning for Socially-Aware Next-Item Recommendation

Bin WuXun SuLong ChenJing LiangYangdong Ye

Journal:   IEEE Transactions on Big Data Year: 2025 Vol: 11 (6)Pages: 3197-3211
JOURNAL ARTICLE

Disentangled Representation Learning for Recommendation

Xin WangHong ChenYuwei ZhouJianxin MaWenwu Zhu

Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Year: 2022 Vol: 45 (1)Pages: 408-424
JOURNAL ARTICLE

Learning Disentangled Representations for Recommendation

Jianxin MaChang ZhouPeng CuiHongxia YangWenwu Zhu

Journal:   arXiv (Cornell University) Year: 2019 Vol: 32 Pages: 5711-5722
JOURNAL ARTICLE

Deep Reinforcement Learning Framework for Category-Based Item Recommendation

Mingsheng FuAnubha AgrawalAthirai A. IrissappaneJie ZhangLiwei HuangHong Qu

Journal:   IEEE Transactions on Cybernetics Year: 2021 Vol: 52 (11)Pages: 12028-12041
© 2026 ScienceGate Book Chapters — All rights reserved.