SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations

Tanvir Mahmud; Chun-Hao Liu; Burhaneddin Yaman; Diana Marculescu

doi:10.1109/wacv57701.2024.00663

ScienceGate Book Chapters

JOURNAL ARTICLE

SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations

Tanvir Mahmud Chun-Hao Liu Burhaneddin Yaman Diana Marculescu

Year: 2024 Pages: 6759-6768

DOI: 10.1109/wacv57701.2024.00663

Get Full-Text PDF Get Analytical Report

Abstract

Despite significant progress in semi-supervised learning for image object detection, several key issues are yet to be addressed for video object detection: (1) Achieving good performance for supervised video object detection greatly depends on the availability of annotated frames. (2) Despite having large inter-frame correlations in a video, collecting annotations for a large number of frames per video is expensive, time-consuming, and often redundant. (3) Existing semi-supervised techniques on static images can hardly exploit the temporal motion dynamics inherently present in videos. In this paper, we introduce SSVOD, an end-to-end semi-supervised video object detection framework that exploits motion dynamics of videos to utilize large-scale unlabeled frames with sparse annotations. To selectively assemble robust pseudo-labels across groups of frames, we introduce flow-warped predictions from nearby frames for temporal-consistency estimation. In particular, we introduce cross-IoU and cross-divergence based selection methods over a set of estimated predictions to include robust pseudo-labels for bounding boxes and class labels, respectively. To strike a balance between confirmation bias and uncertainty noise in pseudo-labels, we propose confidence threshold based combination of hard and soft pseudo-labels. Our method achieves significant performance improvements over existing methods on ImageNet-VID, Epic-KITCHENS, and YouTube-VIS datasets. Codes are available at https://github.com/enyacgroup/SSVOD.git.

Keywords:

Computer science Artificial intelligence Object detection Exploit Bounding overwatch Frame (networking) Object (grammar) Set (abstract data type) Computer vision Pattern recognition (psychology) Consistency (knowledge bases) Minimum bounding box Machine learning Image (mathematics)

Metrics

Cited By

4.24

FWCI (Field Weighted Citation Impact)

Refs

0.89

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Neural Network Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations

Abstract

Metrics

Citation History

Topics

Related Documents

Point-Teaching: Weakly Semi-supervised Object Detection with Point Annotations

Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection

Weakly Semi-supervised object detection with point annotations in Retinal OCT images

Weakly-Supervised Camouflaged Object Detection with Scribble Annotations

Efficient Weakly-Supervised Object Detection With Pseudo Annotations