JOURNAL ARTICLE

Temporal Context Enhanced Feature Aggregation for Video Object Detection

Fei HeNaiyu GaoQiaozhe LiSenyao DuXin ZhaoKaiqi Huang

Year: 2020 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 34 (07)Pages: 10941-10948   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

Video object detection is a challenging task because of the presence of appearance deterioration in certain video frames. One typical solution is to aggregate neighboring features to enhance per-frame appearance features. However, such a method ignores the temporal relations between the aggregated frames, which is critical for improving video recognition accuracy. To handle the appearance deterioration problem, this paper proposes a temporal context enhanced network (TCENet) to exploit temporal context information by temporal aggregation for video object detection. To handle the displacement of the objects in videos, a novel DeformAlign module is proposed to align the spatial features from frame to frame. Instead of adopting a fixed-length window fusion strategy, a temporal stride predictor is proposed to adaptively select video frames for aggregation, which facilitates exploiting variable temporal information and requiring fewer video frames for aggregation to achieve better results. Our TCENet achieves state-of-the-art performance on the ImageNet VID dataset and has a faster runtime. Without bells-and-whistles, our TCENet achieves 80.3% mAP by only aggregating 3 frames.

Keywords:
Computer science Artificial intelligence Aggregate (composite) Exploit Frame (networking) Computer vision Context (archaeology) Feature (linguistics) Object (grammar) Inter frame Video tracking Object detection Frame rate Task (project management) Pattern recognition (psychology) Reference frame

Metrics

37
Cited By
1.88
FWCI (Field Weighted Citation Impact)
61
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.