JOURNAL ARTICLE

Boosting Video Object Segmentation via Robust and Efficient Memory Network

Yadang ChenDingwei ZhangYuhui ZhengZhi-Xin YangEnhua WuHaixing Zhao

Year: 2023 Journal:   IEEE Transactions on Circuits and Systems for Video Technology Vol: 34 (5)Pages: 3340-3352   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Recently, memory-based methods have exhibited remarkable performance in Video Object Segmentation (VOS) by employing non-local pixel-wise matching between the query and memory. Nevertheless, these methods suffer from two limitations: 1) Non-local pixel-wise matching can result in the incorrect segmentation of background distractor objects, and 2) memory features with substantial temporal redundancy consume significant computing resources and reduce the inference speed. To address the limitations, we first propose a local attention mechanism to suppress background features, and we introduce a novel training framework based on contrast learning to ensure the network learns reliable and robust pixel-wise correspondence between query and memory. We adaptively determine whether to update the memory based on the variation of foreground objects. Next, we propose a dynamic memory bank, which utilizes a lightweight and differentiable soft modulation gate to determine the number of memory features to remove along the temporal dimension. This allows efficient and flexible management of memory features. Our network achieves competitive results (e.g., 92.1% on DAVIS 2016 val, 87.6%/81.3% on DAVIS 2017 val/test, 87.0% on YouTube-VOS 2018 val) compared with the state-of-the-art methods while maintaining a faster inference speed of 25+FPS. Moreover, our network demonstrates a favorable balance between performance and speed when dealing with the long-time video dataset.

Keywords:
Computer science Artificial intelligence Segmentation Boosting (machine learning) Inference Redundancy (engineering) Pixel Pattern recognition (psychology) Computer vision

Metrics

12
Cited By
2.18
FWCI (Field Weighted Citation Impact)
52
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.