Abstract

Automatically annotating concepts for video is a key to semantic-level video browsing, search and navigation. The research on this topic evolved through two paradigms. The first paradigm used binary classification to detect each individual concept in a concept set. It achieved only limited success, as it did not model the inherent correlation between concepts, e.g., urban and building. The second paradigm added a second step on top of the individual concept detectors to fuse multiple concepts. However, its performance varies because the errors incurred in the first detection step can propagate to the second fusion step and therefore degrade the overall performance. To address the above issues, we propose a third paradigm which simultaneously classifies concepts and models correlations between them in a single step by using a novel Correlative Multi-Label (CML) framework. We compare the performance between our proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. We report superior performance from the proposed approach.

Keywords:
Computer science Correlative Set (abstract data type) Fuse (electrical) Semantics (computer science) Key (lock) Artificial intelligence Annotation Machine learning Data mining Information retrieval

Metrics

476
Cited By
44.61
FWCI (Field Weighted Citation Impact)
28
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.