JOURNAL ARTICLE

Active post-refined multimodality video semantic concept detection with tensor representation

Abstract

In this paper, we resolve the problem of multi-modality video representation and semantic concept detection. Interaction and integration of multi-modality media types such as visual, audio and textual data in video are essential to video semantic analysis. Traditionally, videos are represented as vectors in the Euclidean space. Many learning algorithms are then taken to these vectors in a high dimensional space for dimension reduction, classification, clustering and so on. However, the multiple modalities in video not only have their own properties, but also have correlations among them; whereas the simple vector representation weakens the power of these relatively independent modalities and even ignores their relations to some extent. In this paper, we introduce a higher-order tensor framework for video analysis, in which we represent image, video and text three modalities in video shots as data points by the 3rd-order tensor called tensorshots. We propose a novel dimension reduction method that explicitly considers the manifold structure of the tensor space from multimodal media data which is temporal associated co-occurrence and then detect video semantic concepts through powerful classifiers which take tensor as input. Our algorithm preserves the intrinsic structure of the submanifold where tensorshots are sampled, and is also able to map out-of-sample data points directly. Moreover we apply an active learning based contextual and temporal post-refining strategy to enhance detection accuracy. Experiment results show that our method improves the performance of video semantic concept detection.

Keywords:
Computer science Tensor (intrinsic definition) Artificial intelligence Representation (politics) Modality (human–computer interaction) Structure tensor Dimensionality reduction Pattern recognition (psychology) Dimension (graph theory) Modalities Feature vector Computer vision Image (mathematics) Mathematics

Metrics

18
Cited By
1.77
FWCI (Field Weighted Citation Impact)
42
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Tensor-Based Transductive Learning for Multimodality Video Semantic Concept Detection

Fei WuYanan LiuYueting Zhuang

Journal:   IEEE Transactions on Multimedia Year: 2009 Vol: 11 (5)Pages: 868-878
JOURNAL ARTICLE

Transductive Multi-Modality Video Semantic Concept Detection with Tensor Representation

Fei WuYanan LiuYueting Zhuang

Journal:   Journal of Software Year: 2009 Vol: 19 (11)Pages: 2853-2868
BOOK-CHAPTER

Improving Automatic Video Retrieval with Semantic Concept Detection

Markus KoskelaMats SjöbergJorma Laaksonen

Lecture notes in computer science Year: 2009 Pages: 480-489
© 2026 ScienceGate Book Chapters — All rights reserved.