User Attention-guided Multimodal Dialog Systems

Chen Cui; Wenjie Wang; Xuemeng Song; Minlie Huang; Xin-Shun Xu; Liqiang Nie

doi:10.1145/3331184.3331226

ScienceGate Book Chapters

JOURNAL ARTICLE

User Attention-guided Multimodal Dialog Systems

Chen Cui Wenjie Wang Xuemeng Song Minlie Huang Xin-Shun Xu Liqiang Nie

Year: 2019 Pages: 445-454

DOI: 10.1145/3331184.3331226

Get Full-Text PDF Get Analytical Report

Abstract

As an intelligent way to interact with computers, the dialog system has been catching more and more attention. However, most research efforts only focus on text-based dialog systems, completely ignoring the rich semantics conveyed by the visual cues. Indeed, the desire for multimodal task-oriented dialog systems is growing with the rapid expansion of many domains, such as the online retailing and travel. Besides, few work considers the hierarchical product taxonomy and the users' attention to products explicitly. The fact is that users tend to express their attention to the semantic attributes of products such as color and style as the dialog goes on. Towards this end, in this work, we present a hierarchical User attention-guided Multimodal Dialog system, named UMD for short. UMD leverages a bidirectional Recurrent Neural Network to model the ongoing dialog between users and chatbots at a high level; As to the low level, the multimodal encoder and decoder are capable of encoding multimodal utterances and generating multimodal responses, respectively. The multimodal encoder learns the visual presentation of images with the help of a taxonomy-attribute combined tree, and then the visual features interact with textual features through an attention mechanism; whereas the multimodal decoder selects the required visual images and generates textual responses according to the dialog history. To evaluate our proposed model, we conduct extensive experiments on a public multimodal dialog dataset in the retailing domain. Experimental results demonstrate that our model outperforms the existing state-of-the-art methods by integrating the multimodal utterances and encoding the visual features based on the users' attribute-level attention.

Keywords:

Dialog box Computer science Dialog system Human–computer interaction Multimodal interaction Encoder Artificial intelligence Natural language processing Multimodality Semantics (computer science) World Wide Web

Metrics

Cited By

3.63

FWCI (Field Weighted Citation Impact)

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

User Attention-guided Multimodal Dialog Systems

Abstract

Metrics

Citation History

Topics

Related Documents

Domain-aware Multimodal Dialog Systems with Distribution-based User Characteristic Modeling

Constraining User Response via Multimodal Dialog Interface

Cognitive Attention Network (CAN) for Text and Image Multimodal Visual Dialog Systems

User Comment-Guided Cross-Modal Attention for Interpretable Multimodal Fake News Detection

User Models in Dialog Systems