Abstract

Video object detection is a tough task due to the deteriorated quality of\nvideo sequences captured under complex environments. Currently, this area is\ndominated by a series of feature enhancement based methods, which distill\nbeneficial semantic information from multiple frames and generate enhanced\nfeatures through fusing the distilled information. However, the distillation\nand fusion operations are usually performed at either frame level or instance\nlevel with external guidance using additional information, such as optical flow\nand feature memory. In this work, we propose a dual semantic fusion network\n(abbreviated as DSFNet) to fully exploit both frame-level and instance-level\nsemantics in a unified fusion framework without external guidance. Moreover, we\nintroduce a geometric similarity measure into the fusion process to alleviate\nthe influence of information distortion caused by noise. As a result, the\nproposed DSFNet can generate more robust features through the multi-granularity\nfusion and avoid being affected by the instability of external guidance. To\nevaluate the proposed DSFNet, we conduct extensive experiments on the ImageNet\nVID dataset. Notably, the proposed dual semantic fusion network achieves, to\nthe best of our knowledge, the best performance of 84.1\\% mAP among the current\nstate-of-the-art video object detectors with ResNet-101 and 85.4\\% mAP with\nResNeXt-101 without using any post-processing steps.\n

Keywords:
Computer science Dual (grammatical number) Object (grammar) Object detection Artificial intelligence Fusion Computer vision Pattern recognition (psychology)

Metrics

28
Cited By
2.10
FWCI (Field Weighted Citation Impact)
76
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Dual Selection Network for Video Object Detection

Tianxiang HouQiang QiYang LuKaiwen DuHanzi Wang

Journal:   2022 IEEE International Conference on Multimedia and Expo (ICME) Year: 2022 Vol: 23 Pages: 1-6
JOURNAL ARTICLE

Dual optical flow network-guided video object detection

Wan‐Qing YuJing YUXinqi ShiChuangbai Xiao

Journal:   Journal of Image and Graphics Year: 2021 Vol: 26 (10)Pages: 2473-2484
JOURNAL ARTICLE

Semantic-guided complementary fusion network for salient object detection

Kunqian YangChuan He

Journal:   Neurocomputing Year: 2025 Vol: 622 Pages: 129383-129383
© 2026 ScienceGate Book Chapters — All rights reserved.