JOURNAL ARTICLE

Stacked Multimodal Attention Network for Context-Aware Video Captioning

Yi ZhengYuejie ZhangRui FengTao ZhangWeiguo Fan

Year: 2021 Journal:   IEEE Transactions on Circuits and Systems for Video Technology Vol: 32 (1)Pages: 31-42   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Recent neural models for video captioning usually employ an attention-based encoder-decoder framework. However, current approaches mainly attend to the motion features and object features of the video when generating the caption, but ignore the potential but useful historical information. Besides, exposure bias and vanishing gradients problems always exist in current caption generation models. In this paper, we propose a novel video captioning framework, named Stacked Multimodal Attention Network (SMAN). It adopts additional visual and textual historical information during caption generation as context features, employs a stacked architecture to process different features gradually, and utilizes the Reinforcement Learning method and coarse-to-fine training strategy to further improve the generated results. Both quantitative and qualitative experiments on the benchmark datasets of MSVD and MSR-VTT show the effectiveness and feasibility of our framework. The codes are available on https://github.com/zhengyi123456/SMAN .

Keywords:
Closed captioning Computer science Context (archaeology) Artificial intelligence Encoder Benchmark (surveying) Natural language processing Process (computing) Visualization Artificial neural network Information retrieval Image (mathematics) Programming language

Metrics

39
Cited By
3.17
FWCI (Field Weighted Citation Impact)
63
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.