JOURNAL ARTICLE

Video description method with fusion of instance-aware temporal features

Abstract

There are still challenges in the field of video understanding today, especially how to use natural language to describe the visual content in videos. Existing video encoder-decoder models struggle to extract deep semantic information and effectively understand the complex contextual semantics in a video sequence. Furthermore, different visual elements in the video contribute differently to the generation of video text descriptions. In this paper, we propose a video description method that fuses instance-aware temporal features. We extract local features of instances on the temporal sequence to enhance perception of temporal instances. We also employ spatial attention to perform weighted fusion of temporal features. Finally, we use bidirectional long short-term memory networks to encode the contextual semantic information of the video sequence, thereby helping to generate higher quality descriptive text. Experimental results on two public datasets demonstrate that our method achieves good performance on various evaluation metrics.

Keywords:
Computer science Encoder Semantics (computer science) ENCODE Artificial intelligence Field (mathematics) Sequence (biology) Perception Information retrieval

Metrics

1
Cited By
0.18
FWCI (Field Weighted Citation Impact)
0
Refs
0.41
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Hybrid Instance-Aware Temporal Fusion for Online Video Instance Segmentation

Xiang LiJinglu WangXiaoli LiYan Lu

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2022 Vol: 36 (2)Pages: 1429-1437
JOURNAL ARTICLE

Video instance segmentation based on temporal feature fusion

Zetao HuangYan LiuChenglong YuJiajia ZhangX. WangShuhan Qi

Journal:   Journal of Image and Graphics Year: 2021 Vol: 26 (7)Pages: 1692-1703
JOURNAL ARTICLE

STFormer: Spatial-Temporal-Aware Transformer for Video Instance Segmentation

Hao LiWei WangMengzhu WangHuibin TanLong LanZhigang LuoXinwang LiuKenli Li

Journal:   IEEE Transactions on Neural Networks and Learning Systems Year: 2024 Vol: 36 (7)Pages: 12910-12924
BOOK-CHAPTER

Temporal Based Instance-Level Fusion for Video Object Detection

Qiang CaiNan KangHaisheng LiJian CaoWenqing LiuRuyi Wan

Lecture notes in electrical engineering Year: 2021 Pages: 436-446
© 2026 ScienceGate Book Chapters — All rights reserved.