JOURNAL ARTICLE

SpVOS: Efficient Video Object Segmentation With Triple Sparse Convolution

Weihao LinTao ChenChong Yu

Year: 2023 Journal:   IEEE Transactions on Image Processing Vol: 32 Pages: 5977-5991   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Semi-supervised video object segmentation (Semi-VOS), which requires only annotating the first frame of a video to segment future frames, has received increased attention recently. Among existing Semi-VOS pipelines, the memory-matching-based one is becoming the main research stream, as it can fully utilize the temporal sequence information to obtain high-quality segmentation results. Even though this type of method has achieved promising performance, the overall framework still suffers from heavy computation overhead, mainly caused by the per-frame dense convolution operations between high-resolution feature maps and each kernel filter. Therefore, we propose a sparse baseline of VOS named SpVOS in this work, which develops a novel triple sparse convolution to reduce the computation costs of the overall VOS framework. The designed triple gate, taking full consideration of both spatial and temporal redundancy between adjacent video frames, adaptively makes a triple decision to decide how to apply the sparse convolution on each pixel to control the computation overhead of each layer, while maintaining sufficient discrimination capability to distinguish similar objects and avoid error accumulation. A mixed sparse training strategy, coupled with a designed objective considering the sparsity constraint, is also developed to balance the VOS segmentation performance and computation costs. Experiments are conducted on two mainstream VOS datasets, including DAVIS and Youtube-VOS. Results show that, the proposed SpVOS achieves superior performance over other state-of-the-art sparse methods, and even maintains comparable performance, e.g., an 83.04% (79.29%) overall score on the DAVIS-2017 (Youtube-VOS) validation set, with the typical non-sparse VOS baseline (82.88% for DAVIS-2017 and 80.36% for Youtube-VOS) while saving up to 42% FLOPs, showing its application potential for resource-constrained scenarios.

Keywords:
Computer science Artificial intelligence Segmentation Kernel (algebra) Computer vision Pattern recognition (psychology) Convolution (computer science) Mathematics

Metrics

2
Cited By
0.36
FWCI (Field Weighted Citation Impact)
86
Refs
0.55
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Temporo-Spatial Parallel Sparse Memory Networks for Efficient Video Object Segmentation

Jisheng DangHuicheng ZhengBimei WangLongguang WangYulan Guo

Journal:   IEEE Transactions on Intelligent Transportation Systems Year: 2024 Vol: 25 (11)Pages: 17291-17304
JOURNAL ARTICLE

Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation

Jisheng DangHuicheng ZhengXiaohao XuLongguang WangQingyong HuYulan Guo

Journal:   IEEE Transactions on Neural Networks and Learning Systems Year: 2024 Vol: 36 (2)Pages: 3820-3833
JOURNAL ARTICLE

Label-Efficient Video Object Segmentation With Motion Clues

Yawen LuJie ZhangSu SunQianyu GuoZhiwen CaoSonglin FeiBaijian YangYingjie Chen

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2023 Vol: 34 (8)Pages: 6710-6721
© 2026 ScienceGate Book Chapters — All rights reserved.