Video Summarization by Learning Deep Side Semantic Embedding

Yitian Yuan; Tao Mei; Peng Cui; Wenwu Zhu

doi:10.1109/tcsvt.2017.2771247

ScienceGate Book Chapters

JOURNAL ARTICLE

Video Summarization by Learning Deep Side Semantic Embedding

Yitian Yuan Tao Mei Peng Cui Wenwu Zhu

Year: 2017 Journal: IEEE Transactions on Circuits and Systems for Video Technology Vol: 29 (1)Pages: 226-237 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tcsvt.2017.2771247

Get Full-Text PDF Get Analytical Report

Abstract

With the rapid growth of video content, video summarization, which focuses on automatically selecting important and informative parts from videos, is becoming increasingly crucial. However, the problem is challenging due to its subjectiveness. Previous research, which predominantly relies on manually designed criteria or resourcefully expensive human annotations, often fails to achieve satisfying results. We observe that the side information associated with a video (e.g., surrounding text such as titles, queries, descriptions, comments, and so on) represents a kind of human-curated semantics of video content. This side information, although valuable for video summarization, is overlooked in existing approaches. In this paper, we present a novel deep side semantic embedding (DSSE) model to generate video summaries by leveraging the freely available side information. The DSSE constructs a latent subspace by correlating the hidden layers of the two uni-modal autoencoders, which embed the video frames and side information, respectively. Specifically, by interactively minimizing the semantic relevance loss and the feature reconstruction loss of the two uni-modal autoencoders, the comparable common information between video frames and side information can be more completely learned. Therefore, their semantic relevance can be more effectively measured. Finally, semantically meaningful segments are selected from videos by minimizing their distances to the side information in the constructed latent subspace. We conduct experiments on two datasets (Thumb1K and TVSum50) and demonstrate the superior performance of DSSE to the several state-of-the-art approaches to video summarization.

Keywords:

Automatic summarization Computer science Relevance (law) Subspace topology Semantics (computer science) Information retrieval Embedding Artificial intelligence Modal Machine learning

Metrics

Cited By

4.19

FWCI (Field Weighted Citation Impact)

Refs

0.95

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Video Analysis and Summarization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Summarization by Learning Deep Side Semantic Embedding

Abstract

Metrics

Citation History

Topics

Related Documents

Learning deep semantic attributes for user video summarization

Deep Reinforcement Learning for Video Summarization with Semantic Reward

Video Summarization Using Deep Semantic Features

Video summarization by learning semantic information

Deep attentive and semantic preserving video summarization