Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

Xiaolin Chen; Xuemeng Song; Liqiang Jing; Shuo Li; Linmei Hu; Liqiang Nie

doi:10.1145/3606368

ScienceGate Book Chapters

JOURNAL ARTICLE

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

Xiaolin Chen Xuemeng Song Liqiang Jing Shuo Li Linmei Hu Liqiang Nie

Year: 2023 Journal: ACM Transactions on Information Systems Vol: 42 (2)Pages: 1-25

DOI: 10.1145/3606368

Get Full-Text PDF Get Analytical Report

Abstract

Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: (1) overlook the benefit of generative pretraining and (2) ignore the textual context-related knowledge . To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language mode for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection , dual knowledge-enhanced context learning , and knowledge-enhanced response generation . To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly, integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.

Keywords:

Dialog box Computer science Generative grammar Component (thermodynamics) Dual (grammatical number) Context (archaeology) Artificial intelligence Natural language processing Task (project management) Generative model Selection (genetic algorithm) Dialog system Human–computer interaction Linguistics Engineering World Wide Web

Metrics

Cited By

4.34

FWCI (Field Weighted Citation Impact)

Refs

0.93

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

Abstract

Metrics

Citation History

Topics

Related Documents

Improving Multiple Documents Grounded Goal-Oriented Dialog Systems via Diverse Knowledge Enhanced Pretrained Language Model

Dual Semantic Knowledge Composed Multimodal Dialog Systems

Semantic–Electromagnetic Inversion With Pretrained Multimodal Generative Model

Layerwised multimodal knowledge distillation for vision-language pretrained model

Adapting Generative Pretrained Language Model for Open-domain Multimodal Sentence Summarization