JOURNAL ARTICLE

Self-Attention Based Generative Adversarial Networks For Unsupervised Video Summarization

Abstract

In this paper, we study the problem of producing a comprehensive video summary following an unsupervised approach that relies on adversarial learning. We build on a popular method where a Generative Adversarial Network (GAN) is trained to create representative summaries, indistinguishable from the originals. The introduction of the attention mechanism into the architecture for the selection, encoding and decoding of video frames, shows the efficacy of self-attention and transformer in modeling temporal relationships for video summarization. We propose the SUM-GAN-AED model that uses a self-attention mechanism for frame selection, combined with LSTMs for encoding and decoding. We evaluate the performance of the SUM-GAN-AED model on the SumMe, TVSum and COGNIMUSE datasets. Experimental results indicate that using a self-attention mechanism as the frame selection mechanism outperforms the state-of-the-art on SumMe and leads to comparable to state-of-the-art performance on TVSum and COGNIMUSE.

Keywords:
Automatic summarization Computer science Decoding methods Adversarial system Generative grammar Selection (genetic algorithm) Transformer Artificial intelligence Encoding (memory) Frame (networking) Machine learning Mechanism (biology) Unsupervised learning Generative adversarial network Deep learning Algorithm

Metrics

11
Cited By
2.00
FWCI (Field Weighted Citation Impact)
28
Refs
0.84
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.