By enabling retrieval and recommender systems to dynamically obtain user pref- erences through conversations with users, conversational search and recommendation have become increasingly popular in recent years. This process starts with receiving a request from the user and continues with asking clarifying questions or suggesting some possible items or documents by the system. In this way, the system can get valuable feedback from users to accurately determine the users’ needs. This process repeats until the search or recommendation is successful or the user accepts defeat. Due to the dynamic nature of conversational tasks, an agent needs to be trained to decide on different actions (e.g., asking a question or recommending an item) at each turn of the conversation while learning a unified policy for these decisions. If the agent decides to ask a question, the value of each question in the conversation is unknown at the outset. Therefore, the agent must be trained to predict the value of a question based on the context provided by previous turns. Moreover, not all words in the user’s responses are useful, so the agent must learn to effectively use only the most relevant information. In this thesis, we aim to address all of these challenges. Advances in deep reinforcement learning techniques provide new opportunities for interactive conversational search and recommendation. We study how to develop agents with deep reinforcement learning for conversational search and recommendation in order to have flexible interaction with users to satisfy them by reaching the goal of the conversation. We believe that a good solution for this task should have these features: 1) asking appropriate personalized questions in the right order to bring the system closer to the target item; 2) showing some items during the conversation to get some feedback from the user; 3) making the conversation as short as possible; 4) finding appropriate items from a massive collection of possibilities; 5) extracting relevant information from the user’s utterances and using it in the retrieval or recommendation in the next turn of the conversation. To capture these features, we first introduce a model based on an Actor-Critic algorithm to jointly learn the dialogue policy and recommendation model at the same time. To address the challenge of selecting items from a vast collection of possibilities, we introduce a tree-structured Actor for this task, where a balanced hierarchical clustering tree is built over the items/questions. Selecting an item/question is framed as navigating a path from the root to a specific leaf of the tree. In each round of the conversation, the proposed model can offer simultaneous recommendations, provided that modality and screen real estate allow. Secondly, we explore the problem of generating relevant questions for conversational product search by maximizing any desired metric (i.e., the ultimate goal of the conversation), objectives, or even an arbitrary user satisfaction signal. We argue that the true values of questions in a conversation are inherently unknown, as the significance of a question can vary based on context, individual perspectives, and the evolving nature of the discussion. Therefore, we estimate the true value of questions by analyzing their answers and their relation to the primary goal of the conversation. We also explore how Large Language Models (LLMs) can generate funnel questions to clarify user preferences during a conversation. Finally, we study the problem of extracting relevant information from the user’s utterances in a conversation.
Longxiang ShiZilin ZhangShoujin WangQi ZhangMinghui WuCheng YangShijian Li
Xiaocong ChenChaoran HuangLina YaoXianzhi WangWei liuWenjie Zhang
Ali MontazeralghaemJames Allan