JOURNAL ARTICLE

Video Description Model Based on Temporal-Spatial and Channel Multi-Attention Mechanisms

Jie XuHaoliang WeiLinke LiQiuru FuJinhong Guo

Year: 2020 Journal:   Applied Sciences Vol: 10 (12)Pages: 4312-4312   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

Video description plays an important role in the field of intelligent imaging technology. Attention perception mechanisms are extensively applied in video description models based on deep learning. Most existing models use a temporal-spatial attention mechanism to enhance the accuracy of models. Temporal attention mechanisms can obtain the global features of a video, whereas spatial attention mechanisms obtain local features. Nevertheless, because each channel of the convolutional neural network (CNN) feature maps has certain spatial semantic information, it is insufficient to merely divide the CNN features into regions and then apply a spatial attention mechanism. In this paper, we propose a temporal-spatial and channel attention mechanism that enables the model to take advantage of various video features and ensures the consistency of visual features between sentence descriptions to enhance the effect of the model. Meanwhile, in order to prove the effectiveness of the attention mechanism, this paper proposes a video visualization model based on the video description. Experimental results show that, our model has achieved good performance on the Microsoft Video Description (MSVD) dataset and a certain improvement on the Microsoft Research-Video to Text (MSR-VTT) dataset.

Keywords:
Computer science Convolutional neural network Artificial intelligence Field (mathematics) Mechanism (biology) Feature (linguistics) Consistency (knowledge bases) Spatial analysis Channel (broadcasting) Sentence Visualization

Metrics

9
Cited By
0.63
FWCI (Field Weighted Citation Impact)
37
Refs
0.69
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.