Unsupervised Video Summarization with Attentive Conditional Generative Adversarial Networks

Xufeng He; Hua Yang; Tao Song; Zongpu Zhang; Zhengui Xue; Ruhui Ma; Neil M. Robertson; Haibing Guan

doi:10.1145/3343031.3351056

ScienceGate Book Chapters

JOURNAL ARTICLE

Unsupervised Video Summarization with Attentive Conditional Generative Adversarial Networks

Xufeng He Hua Yang Tao Song Zongpu Zhang Zhengui Xue Ruhui Ma Neil M. Robertson Haibing Guan

Year: 2019 Pages: 2296-2304

DOI: 10.1145/3343031.3351056

Get Full-Text PDF Get Analytical Report

Abstract

With the rapid growth of video data, video summarization technique plays a key role in reducing people's efforts to explore the content of videos by generating concise but informative summaries. Though supervised video summarization approaches have been well studied and achieved state-of-the-art performance, unsupervised methods are still highly demanded due to the intrinsic difficulty of obtaining high-quality annotations. In this paper, we propose a novel yet simple unsupervised video summarization method with attentive conditional Generative Adversarial Networks (GANs). Firstly, we build our framework upon Generative Adversarial Networks in an unsupervised manner. Specifically, the generator produces high-level weighted frame features and predicts frame-level importance scores, while the discriminator tries to distinguish between weighted frame features and raw frame features. Furthermore, we utilize a conditional feature selector to guide GAN model to focus on more important temporal regions of the whole video frames. Secondly, we are the first to introduce the frame-level multi-head self-attention for video summarization, which learns long-range temporal dependencies along the whole video sequence and overcomes the local constraints of recurrent units, e.g., LSTMs. Extensive evaluations on two datasets, SumMe and TVSum, show that our proposed framework surpasses state-of-the-art unsupervised methods by a large margin, and even outperforms most of the supervised methods. Additionally, we also conduct the ablation study to unveil the influence of each component and parameter settings in our framework.

Keywords:

Automatic summarization Computer science Artificial intelligence Discriminator Margin (machine learning) Generative grammar Frame (networking) Unsupervised learning Feature (linguistics) Key frame Pattern recognition (psychology) Feature learning Deep learning Generative model Focus (optics) Generator (circuit theory) Machine learning Power (physics)

Metrics

Cited By

4.06

FWCI (Field Weighted Citation Impact)

Refs

0.95

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Video Analysis and Summarization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Unsupervised Video Summarization with Attentive Conditional Generative Adversarial Networks

Abstract

Metrics

Citation History

Topics

Related Documents

Recurrent generative adversarial networks for unsupervised WCE video summarization

Self-Attention Based Generative Adversarial Networks For Unsupervised Video Summarization

Unsupervised Video Summarization with Adversarial LSTM Networks

Unsupervised Video Summarization With Cycle-Consistent Adversarial LSTM Networks

Unsupervised Tumor Characterization via Conditional Generative Adversarial Networks