Open-Ended Multi-Modal Relational Reasoning for Video Question Answering

Haozheng Luo; Ruiyang Qin; Chenwei Xu; Guo Ye; Zening Luo

doi:10.1109/ro-man57019.2023.10309342

ScienceGate Book Chapters

JOURNAL ARTICLE

Open-Ended Multi-Modal Relational Reasoning for Video Question Answering

Haozheng Luo Ruiyang Qin Chenwei Xu Guo Ye Zening Luo

Year: 2023 Pages: 363-369

DOI: 10.1109/ro-man57019.2023.10309342

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, we introduce a robotic agent specifically designed to analyze external environments and address participants' questions. The primary focus of this agent is to assist individuals using language-based interactions within video-based scenes. Our proposed method integrates video recognition technology and natural language processing models within the robotic agent. We investigate the crucial factors affecting human-robot interactions by examining pertinent issues arising between participants and robot agents. Methodologically, our experimental findings reveal a positive relationship between trust and interaction efficiency. Furthermore, our model demonstrates a 2% to 3% performance enhancement in comparison to other benchmark methods.

Keywords:

Computer science Benchmark (surveying) Focus (optics) Question answering Modal Artificial intelligence Robot Human–computer interaction Natural language Natural language processing

Metrics

Cited By

0.36

FWCI (Field Weighted Citation Impact)

Refs

0.55

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Open-Ended Multi-Modal Relational Reasoning for Video Question Answering

Abstract

Metrics

Citation History

Topics

Related Documents

Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks

Attention Based Multi-Modal Fusion Architecture for Open-Ended Video Question Answering Systems

Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation

Differentiated Attention with Multi-modal Reasoning for Video Question Answering

A RAG Approach for Multi-Modal Open-ended Lifelog Question-Answering