Spatial-Temporal Feature Aggregation Network For Video Object Detection

Chen Zhu; Weihai Li; Chi Fei; Bin Liu; Nenghai Yu

doi:10.1109/icassp40776.2020.9054080

ScienceGate Book Chapters

JOURNAL ARTICLE

Spatial-Temporal Feature Aggregation Network For Video Object Detection

Chen Zhu Weihai Li Chi Fei Bin Liu Nenghai Yu

Year: 2020 Pages: 1858-1862

DOI: 10.1109/icassp40776.2020.9054080

Get Full-Text PDF Get Analytical Report

Abstract

Video object detection is a challenging problem in computer vision. In this paper, we propose a novel spatial-temporal feature aggregation network to deal with this issue. Specifically, we present a novel instance-level feature aggregation module as complementary to traditional pixel-level feature aggregation, in which we build a new movement estimation module to learn instance movements across frames. Then the Graph Convolutional Networks (GCNs) is applied to obtain temporal relation among instances over frames to implement instance-level feature aggregation. At last, we combine pixel-level and instance-level features by learnable soft weights to make use of their complementary information. Our framework is simple to implement and enables end-to-end training, which achieves state-of-art performance on the ImageNet VID dataset by extensive experiments.

Keywords:

Computer science Feature (linguistics) Artificial intelligence Graph Pixel Relation (database) Pattern recognition (psychology) Object detection Object (grammar) Feature learning Feature extraction Computer vision Data mining Theoretical computer science

Metrics

Cited By

0.42

FWCI (Field Weighted Citation Impact)

Refs

0.60

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Video Surveillance and Tracking Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Spatial-Temporal Feature Aggregation Network For Video Object Detection

Abstract

Metrics

Citation History

Topics

Related Documents

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection

Patchwise Temporal–Spatial Feature Aggregation Network for Object Detection in Satellite Video

Temporal Context Enhanced Feature Aggregation for Video Object Detection

Real-Time Video Object Detection with Temporal Feature Aggregation

Temporal-adaptive sparse feature aggregation for video object detection