Video Summarization Using Deep 3D ConvNets with Multi-Attention

Muhammad Shahzaib Alam; Adnan Ahmed; Danyal Aftab; Muhammad Rafi

doi:10.1109/iccit58132.2023.10273932

ScienceGate Book Chapters

JOURNAL ARTICLE

Video Summarization Using Deep 3D ConvNets with Multi-Attention

Muhammad Shahzaib Alam Adnan Ahmed Danyal Aftab Muhammad Rafi

Year: 2023 Pages: 433-441

DOI: 10.1109/iccit58132.2023.10273932

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, we propose a novel method for key-shots-based video summarization by introducing 3D Convnets with Multi-Attention. The process starts by encoding the video data into time-variant frames in 3D followed by two steps of visual attention. The first step learns attention weights for features inside each frame and the second step learns attention weights for all the frames hence deciding the importance score between 0 and 1 for each frame for the target summarization. The current state-of-the-art method used 2D Convnets with self-attention hence losing the dependency of each frame on the next which results in self-attention focusing on fewer features. The keyframes and their relation with time are not maintained. The experimental studies evaluating the proposed approach on two standard video summarization datasets (i) SumMe and (ii) TVSum produced significant improvements. We report new state-of-the-art for the task of video summarization on these datasets.

Keywords:

Automatic summarization Computer science Frame (networking) Artificial intelligence Task (project management) Key (lock) Process (computing) Dependency (UML) Encoding (memory) Relation (database) Computer vision Pattern recognition (psychology) Data mining

Metrics

Cited By

0.18

FWCI (Field Weighted Citation Impact)

Refs

0.43

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Video Analysis and Summarization

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Summarization Using Deep 3D ConvNets with Multi-Attention

Abstract

Metrics

Citation History

Topics

Related Documents

Multi-Modal Video Summarization Using Deep Learning and Attention Mechanisms

Video Summarization with LSTM and Deep Attention Models

Video Summarization with Anchors and Multi-Head Attention

Video Summarization using Multi-Scale Dilated Hybrid Attention Mechanisms

Deep hierarchical LSTM networks with attention for video summarization