Dynamic video summarisation using stacked encoder-decoder architecture with residual learning network

M. Dhanushree; R. Priya; P. Aruna; R. Bhavani

doi:10.1504/ijiei.2024.137702

ScienceGate Book Chapters

JOURNAL ARTICLE

Dynamic video summarisation using stacked encoder-decoder architecture with residual learning network

M. Dhanushree R. Priya P. Aruna R. Bhavani

Year: 2024 Journal: International Journal of Intelligent Engineering Informatics Vol: 12 (1)Pages: 27-59

DOI: 10.1504/ijiei.2024.137702

Get Full-Text PDF Get Analytical Report

Abstract

In the past decade, video summarisation has emerged as one of the most challenging research fields in video understanding. Video summarisation is abstracting an original video by extracting the most informative parts or key events. In particular, generic video summarisation is challenging as the key events do not contain specific activities. In such circumstances, extensive spatial features are needed to identify video events. Thus, a stacked encoder-decoder architecture with a residual learning network (SERNet) model is proposed for generating dynamic summaries of generic videos. GoogleNet characteristics are extracted for each frame in the proposed model. After the bi-directional gated recurrent unit encodes video features, the gated recurrent unit decodes them. Both the encoder and decoder architectures leverage residual learning to extract hierarchical dense spatial features to increase video summarisation F-scores. SumMe and TVSum are used for experiments. Experimental results demonstrate that the suggested SERNet model has an F-score of 55.6 and 64.23 for SumMe and TVSum. Comparing the proposed SERNet model against state-of-the-art approaches indicates its robustness.

Keywords:

Computer science Residual Encoder Architecture Artificial intelligence Deep learning Computer architecture Speech recognition Pattern recognition (psychology) Algorithm

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.04

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Video Analysis and Summarization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Generative Adversarial Networks and Image Synthesis

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Dynamic video summarisation using stacked encoder-decoder architecture with residual learning network

Abstract

Metrics

Topics

Related Documents

Dynamic video summarization using stacked encoder-decoder architecture with residual learning network

Efficient Image Dehazing Using an Encoder-Decoder Network with Residual Learning

Video Captioning using LSTM-based Encoder-Decoder Architecture

Video intra prediction using convolutional encoder decoder network

Encoder Decoder Based Deep Learning Architecture For Video Scene Parsing