JOURNAL ARTICLE

Video Summarization by Learning Deep Side Semantic Embedding

Yitian YuanTao MeiPeng CuiWenwu Zhu

Year: 2017 Journal:   IEEE Transactions on Circuits and Systems for Video Technology Vol: 29 (1)Pages: 226-237   Publisher: Institute of Electrical and Electronics Engineers

Abstract

With the rapid growth of video content, video summarization, which focuses on automatically selecting important and informative parts from videos, is becoming increasingly crucial. However, the problem is challenging due to its subjectiveness. Previous research, which predominantly relies on manually designed criteria or resourcefully expensive human annotations, often fails to achieve satisfying results. We observe that the side information associated with a video (e.g., surrounding text such as titles, queries, descriptions, comments, and so on) represents a kind of human-curated semantics of video content. This side information, although valuable for video summarization, is overlooked in existing approaches. In this paper, we present a novel deep side semantic embedding (DSSE) model to generate video summaries by leveraging the freely available side information. The DSSE constructs a latent subspace by correlating the hidden layers of the two uni-modal autoencoders, which embed the video frames and side information, respectively. Specifically, by interactively minimizing the semantic relevance loss and the feature reconstruction loss of the two uni-modal autoencoders, the comparable common information between video frames and side information can be more completely learned. Therefore, their semantic relevance can be more effectively measured. Finally, semantically meaningful segments are selected from videos by minimizing their distances to the side information in the constructed latent subspace. We conduct experiments on two datasets (Thumb1K and TVSum50) and demonstrate the superior performance of DSSE to the several state-of-the-art approaches to video summarization.

Keywords:
Automatic summarization Computer science Relevance (law) Subspace topology Semantics (computer science) Information retrieval Embedding Artificial intelligence Modal Machine learning

Metrics

90
Cited By
4.19
FWCI (Field Weighted Citation Impact)
68
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.