Abstract

Video object detection is more challenging than image object detection because of the deteriorated frame quality. To enhance the feature representation, state-of-the-art methods propagate temporal information into the deteriorated frame by aligning and aggregating entire feature maps from multiple nearby frames. However, restricted by feature map's low storage-efficiency and vulnerable content-address allocation, long-term temporal information is not fully stressed by these methods. In this work, we propose the first object guided external memory network for online video object detection. Storage-efficiency is handled by object guided hard-attention to selectively store valuable features, and long-term information is protected when stored in an addressable external data matrix. A set of read/write operations are designed to accurately propagate/allocate and delete multi-level memory feature under object guidance. We evaluate our method on the ImageNet VID dataset and achieve state-of-the-art performance as well as good speed-accuracy tradeoff. Furthermore, by visualizing the external memory, we show the detailed object-level reasoning process across frames.

Keywords:
Computer science Feature (linguistics) Object (grammar) Artificial intelligence Frame (networking) Object detection Computer vision Process (computing) Representation (politics) Set (abstract data type) Video tracking Feature extraction Auxiliary memory Pattern recognition (psychology) Computer hardware

Metrics

113
Cited By
8.23
FWCI (Field Weighted Citation Impact)
79
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Dual optical flow network-guided video object detection

Wan‐Qing YuJing YUXinqi ShiChuangbai Xiao

Journal:   Journal of Image and Graphics Year: 2021 Vol: 26 (10)Pages: 2473-2484
JOURNAL ARTICLE

Video Object Detection Guided by Object Blur Evaluation

Yujie WuHong ZhangYawei LiY. F. YangDing Yuan

Journal:   IEEE Access Year: 2020 Vol: 8 Pages: 208554-208565
JOURNAL ARTICLE

Video Sparse Transformer With Attention-Guided Memory for Video Object Detection

Masato FujitakeAkihiro Sugimoto

Journal:   IEEE Access Year: 2022 Vol: 10 Pages: 65886-65900
JOURNAL ARTICLE

Temporal feature enhancement network with external memory for live-stream video object detection

Masato FujitakeAkihiro Sugimoto

Journal:   Pattern Recognition Year: 2022 Vol: 131 Pages: 108847-108847
© 2026 ScienceGate Book Chapters — All rights reserved.