DLSTM approach to video modeling with hashing for large-scale video retrieval

Naifan Zhuang; Jun Ye; Kien A. Hua

doi:10.1109/icpr.2016.7900131

ScienceGate Book Chapters

JOURNAL ARTICLE

DLSTM approach to video modeling with hashing for large-scale video retrieval

Naifan Zhuang Jun Ye Kien A. Hua

Year: 2016 Pages: 3222-3227

DOI: 10.1109/icpr.2016.7900131

Get Full-Text PDF Get Analytical Report

Abstract

Although Query-by-Example techniques based on Euclidean distance in a multidimensional feature space have proved to be effective for image databases, this approach cannot be effectively applied to video since the number of dimensions would be massive due to the richness and complexity of video data. The above issue has been addressed in two recent solutions, namely Deterministic Quantization (DQ) and Dynamic Temporal Quantization (DTQ). DQ divides the video into equal segments and extracts a visual feature vector for each segment. The bag-of-word feature is then encoded by hashing to facilitate approximate nearest neighbor search using Hamming distance. One weakness of this approach is the deterministic segmentation of video data. DTQ improves on this by using dynamic video segmentation to obtain varied-length video segments. As a result, feature vectors extracted from these video segments can better capture the semantic content of the video. To support very large video databases, it is desirable to minimize the number of segments in order to keep the size of the feature representation as small as possible. We achieve this by using only one video segment (i.e., no video data segmentation is even necessary) with even better retrieval performance. Our scheme models video using differential long short-term memory (DLSTM) recurrent neural networks and obtains a highly compact fixed-size feature representation with the output of hidden states of the DLSTM. Each of these features are further compressed by hashing them into binary bits via quantization. Experimental results based on two public data sets, UCF101 and MSRActionPairs, indicate that the proposed video modeling technique outperforms DTQ by a significant margin.

Keywords:

Computer science Artificial intelligence Pattern recognition (psychology) Feature vector Video tracking Nearest neighbor search Hamming space Hash function Segmentation Feature (linguistics) Vector quantization Search engine indexing Image retrieval Quantization (signal processing) Computer vision Video processing Algorithm Hamming code Image (mathematics)

Metrics

Cited By

0.84

FWCI (Field Weighted Citation Impact)

Refs

0.83

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Analysis and Summarization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Retrieval and Classification Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

DLSTM approach to video modeling with hashing for large-scale video retrieval

Abstract

Metrics

Citation History

Topics

Related Documents

Attention-Based Video Hashing for Large-Scale Video Retrieval

Supervised Recurrent Hashing for Large Scale Video Retrieval

Unsupervised Deep Video Hashing via Balanced Code for Large-Scale Video Retrieval

Classification-enhancement deep hashing for large-scale video retrieval

Stochastic Multiview Hashing for Large-Scale Near-Duplicate Video Retrieval