Video-Based Cross-Modal Recipe Retrieval

Da Cao; Zhiwang Yu; Hanling Zhang; Jiansheng Fang; Liqiang Nie; Qi Tian

doi:10.1145/3343031.3351067

ScienceGate Book Chapters

JOURNAL ARTICLE

Video-Based Cross-Modal Recipe Retrieval

Da Cao Zhiwang Yu Hanling Zhang Jiansheng Fang Liqiang Nie Qi Tian

Year: 2019 Pages: 1685-1693

DOI: 10.1145/3343031.3351067

Get Full-Text PDF Get Analytical Report

Abstract

As a natural extension of image-based cross-modal recipe retrieval, retrieving a specific video given a recipe as the query is seldom explored. There are various temporal and spatial elements hidden in cooking videos. In addition, current image-based cross-modal recipe retrieval approaches mostly emphasize the understanding of textual and visual content independently. Such methods overlook the interaction between textual and visual content. In this work, we innovatively propose a new problem of video-based cross-modal recipe retrieval and thoroughly investigate this issue under the attention paradigm. In particular, we firstly exploit a parallel-attention network to independently learn the representations of videos and recipes. Next, a co-attention network is proposed to explicitly emphasize the cross-modal interactive features between videos and recipes. Meanwhile, a cross-modal fusion sub-network is proposed to learn both the independent and collaborative dynamics, which can enhance the associated representation of videos and recipes. Last but not the least, the embedding vectors of videos and recipes stemming from joint network are optimized with a pairwise ranking loss. Extensive experiments on a self-collected dataset have verified the effectiveness and rationality of our proposed solution.

Keywords:

Recipe Computer science Modal Ranking (information retrieval) Embedding Pairwise comparison Exploit Representation (politics) Information retrieval Artificial intelligence Machine learning

Metrics

Cited By

2.78

FWCI (Field Weighted Citation Impact)

Refs

0.92

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Analysis and Summarization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video-Based Cross-Modal Recipe Retrieval

Abstract

Metrics

Citation History

Topics

Related Documents

Cross-Modal Recipe Retrieval Model Based on Transformer

Video-based recipe retrieval

Cross modal recipe retrieval with fine grained modal interaction

PBLF: Prompt Based Learning Framework for Cross-Modal Recipe Retrieval

Fine-Grained Alignment for Cross-Modal Recipe Retrieval