JOURNAL ARTICLE

Spatial-Temporal Feature Aggregation Network For Video Object Detection

Abstract

Video object detection is a challenging problem in computer vision. In this paper, we propose a novel spatial-temporal feature aggregation network to deal with this issue. Specifically, we present a novel instance-level feature aggregation module as complementary to traditional pixel-level feature aggregation, in which we build a new movement estimation module to learn instance movements across frames. Then the Graph Convolutional Networks (GCNs) is applied to obtain temporal relation among instances over frames to implement instance-level feature aggregation. At last, we combine pixel-level and instance-level features by learnable soft weights to make use of their complementary information. Our framework is simple to implement and enables end-to-end training, which achieves state-of-art performance on the ImageNet VID dataset by extensive experiments.

Keywords:
Computer science Feature (linguistics) Artificial intelligence Graph Pixel Relation (database) Pattern recognition (psychology) Object detection Object (grammar) Feature learning Feature extraction Computer vision Data mining Theoretical computer science

Metrics

6
Cited By
0.42
FWCI (Field Weighted Citation Impact)
35
Refs
0.60
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection

Chao XuJiangning ZhangMengmeng WangGuanzhong TianYong Liu

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2022 Vol: 32 (11)Pages: 7809-7820
JOURNAL ARTICLE

Temporal Context Enhanced Feature Aggregation for Video Object Detection

Fei HeNaiyu GaoQiaozhe LiSenyao DuXin ZhaoKaiqi Huang

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2020 Vol: 34 (07)Pages: 10941-10948
DISSERTATION

Real-Time Video Object Detection with Temporal Feature Aggregation

Meihong Chen

University:   uO Research (University of Ottawa) Year: 2021
JOURNAL ARTICLE

Temporal-adaptive sparse feature aggregation for video object detection

Fei HeQiaozhe LiXin ZhaoKaiqi Huang

Journal:   Pattern Recognition Year: 2022 Vol: 127 Pages: 108587-108587
© 2026 ScienceGate Book Chapters — All rights reserved.