JOURNAL ARTICLE

Object-Level Feature Memory and Aggregation for Live-Stream Video Object Detection

Abstract

This paper proposes an object-level feature memory module that utilizes attention mechanisms to explore spatial and temporal contexts in videos. Compared to still-image object detectors, video object detectors consider features in the spatiotemporal dimensions, leading to higher accuracy. How-ever, previous video object detection methods often focused on memory and fusion at the feature map level when integrating features across different frames. These approaches not only introduce significant computational and memory burdens but also introduces considerable noise. To address these challenges, we introduce object-level feature memory, which not only retains features from previous frames but also reduces memory and computational overhead, resulting in a substantial improvement in the performance of video object detectors. The experiments conducted on the UA-DETRAC dataset validate the effectiveness of our approach in live-stream video object detection scenarios. Our method achieved 66.73% AP based on YOLOX-S, which is 4.0% more AP than the normal YOLOX-S. Our codes are released at https://github.com/Liyi4578/0FMA.

Keywords:
Computer science Object (grammar) Object detection Feature (linguistics) Artificial intelligence Computer vision Video tracking Pattern recognition (psychology)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
39
Refs
0.22
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.