DISSERTATION

Real-Time Video Object Detection with Temporal Feature Aggregation

Meihong Chen

Year: 2021 University:   uO Research (University of Ottawa)   Publisher: University of Ottawa

Abstract

In recent years, various high-performance networks have been proposed for single-image object detection. An obvious choice is to design a video detection network based on state-of-the-art single-image detectors. However, video object detection is still challenging due to the lower quality of individual frames in a video, and hence the need to include temporal information for high-quality detection results. In this thesis, we design a novel interleaved architecture combining a 2D convolutional network and a 3D temporal network. We utilize Yolov3 as the base detector. To explore inter-frame information, we propose feature aggregation based on a temporal network. Our temporal network utilizes Appearance-preserving 3D convolution (AP3D) for extracting aligned features in the temporal dimension. Our multi-scale detector and multi-scale temporal network communicate at each scale and also across scales. The number of inputs of our temporal network can be either 4, 8, or 16 frames in this thesis and correspondingly we name our temporal network TemporalNet-4, TemporalNet-8 and TemporalNet-16. Our approach achieves 77.1\% mAP (mean Average Precision) on ImageNet VID 2017 dataset with TemporalNet-4, where TemporalNet-16 achieves 80.9\% mAP which is a competitive result on this video object detection benchmark. Our network is also real-time with a running time of 35ms/frame.

Keywords:
Feature (linguistics) Computer science Artificial intelligence Object detection Computer vision Object (grammar) Pattern recognition (psychology)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.12
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Temporal Context Enhanced Feature Aggregation for Video Object Detection

Fei HeNaiyu GaoQiaozhe LiSenyao DuXin ZhaoKaiqi Huang

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2020 Vol: 34 (07)Pages: 10941-10948
JOURNAL ARTICLE

Temporal-adaptive sparse feature aggregation for video object detection

Fei HeQiaozhe LiXin ZhaoKaiqi Huang

Journal:   Pattern Recognition Year: 2022 Vol: 127 Pages: 108587-108587
JOURNAL ARTICLE

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection

Chao XuJiangning ZhangMengmeng WangGuanzhong TianYong Liu

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2022 Vol: 32 (11)Pages: 7809-7820
© 2026 ScienceGate Book Chapters — All rights reserved.