Abstract

Weakly supervised object detection aims at reducing the amount of supervision required to train detection models. Such models are traditionally learned from images/videos labelled only with the object class and not the object bounding box. In our work, we try to leverage not only the object class labels but also the action labels associated with the data. We show that the action depicted in the image/video can provide strong cues about the location of the associated object. We learn a spatial prior for the object dependent on the action (e.g. "ball" is closer to "leg of the person" in "kicking ball"), and incorporate this prior to simultaneously train a joint object detection and action classification model. We conducted experiments on both video datasets and image datasets to evaluate the performance of our weakly supervised object detection model. Our approach outperformed the current state-of-the-art (SOTA) method by more than 6% in mAP on the Charades video dataset.

Keywords:
Artificial intelligence Object detection Computer science Minimum bounding box Leverage (statistics) Object (grammar) Bounding overwatch Computer vision Pattern recognition (psychology) Viola–Jones object detection framework Machine learning Image (mathematics) Face detection Facial recognition system

Metrics

28
Cited By
2.67
FWCI (Field Weighted Citation Impact)
75
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Weakly Supervised Open-Vocabulary Object Detection

Jianghang LinYunhang ShenBingquan WangShaohui LinKe LiLiujuan Cao

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2024 Vol: 38 (4)Pages: 3404-3412
JOURNAL ARTICLE

Misclassification in Weakly Supervised Object Detection

Zhihao WuYong XuJian YangXuelong Li

Journal:   IEEE Transactions on Image Processing Year: 2024 Vol: 33 Pages: 3413-3427
© 2026 ScienceGate Book Chapters — All rights reserved.