A RAG Approach for Multi-Modal Open-ended Lifelog Question-Answering

Quang-Linh Tran; Ngo Ngoc Diep Pham; Quoc Trung Truong; Minh Hưng Nguyễn; Hong Phuong Le; Dang Khoi Vu; Văn Minh Nguyễn; Van Kinh Nguyen; L NGUYEN; Tan Le; M. Dang; Binh T. Nguyen; Gareth J. F. Jones; Cathal Gurrin

doi:10.1145/3731715.3733263

JOURNAL ARTICLE

A RAG Approach for Multi-Modal Open-ended Lifelog Question-Answering

Quang-Linh Tran Ngo Ngoc Diep Pham Quoc Trung Truong Minh Hưng Nguyễn Hong Phuong Le Dang Khoi Vu Văn Minh Nguyễn Van Kinh Nguyen L NGUYEN Tan Le M. Dang Binh T. Nguyen Gareth J. F. Jones Cathal Gurrin

Year: 2025 Pages: 1303-1312

DOI: 10.1145/3731715.3733263

Get Full-Text PDF Get Analytical Report

Abstract

Lifelogging is the passive collection, storage and analysis of daily data through wearable sensors. Question Answering (QA) for lifelog data enables natural language interactions with personal daily life records, providing insights into individual routines and behaviours. While this task has great potential for personal analytics and memory augmentation, progress has been limited due to the challenges of lifelog management, since they can comprise of enormous multi-modal data sets spanning a lifetime. We introduce a Retrieval-Augmented Generation (RAG) approach for addressing the lifelog QA task. A RAG approach first includes a retrieval model finding the correct lifelog events containing answers and then a large language model (LLM) generating answers from the questions. In addition, we construct an open-ended lifelog QA benchmark with 14,187 QA pairs to examine the RAG approach to lifelog QA. Using an embedding-based retrieval approach, our lifelog context retriever achieves a performance of 77.67% Recall@5 and 94.35% Recall@20 using an embedding-based retrieval approach with the Stella 1.5B model. Combined with the Mistral 7B model, the model achieves scores of 39.54% ROUGE-L and 3.475 Accuracy on a scale of 5 scored by GPT-4o. This approach potentially provides an effective approach to lifelog QA with high performance that does not require fine-tuning.

Keywords:

Lifelog Modal Computer science Information retrieval Question answering Human–computer interaction

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.09

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

A RAG Approach for Multi-Modal Open-ended Lifelog Question-Answering

Abstract

Metrics

Topics

Related Documents

Open-Ended Multi-Modal Relational Reasoning for Video Question Answering

Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation

Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks

Attention Based Multi-Modal Fusion Architecture for Open-Ended Video Question Answering Systems

LLQA - Lifelog Question Answering Dataset