JOURNAL ARTICLE

Video Object Segmentation with Dynamic Memory Networks and Adaptive Object Alignment

Abstract

In this paper, we propose a novel solution for object-matching based semi-supervised video object segmentation, where the target object masks in the first frame are provided. Existing object-matching based methods focus on the matching between the raw object features of the current frame and the first/previous frames. However, two issues are still not solved by these object-matching based methods. As the appearance of the video object changes drastically over time, 1) unseen parts/details of the object present in the current frame, resulting in incomplete annotation in the first annotated frame (e.g. view/scale changes). 2) even for the seen parts/details of the object in the current frame, their positions change relatively (e.g. pose changes/camera motion), leading to a misalignment for the object matching. To obtain the complete information of the target object, we propose a novel object-based dynamic memory network that exploits visual contents of all the past frames. To solve the misalignment problem caused by position changes of visual contents, we propose an adaptive object alignment module by incorporating a region translation function that aligns object proposals towards templates in the feature space. Our method achieves state-of-the-art results on latest benchmark datasets DAVIS 2017 ($\mathcal{J}$ of 81.4% and $\mathcal{F}$ of 87.5% on the validation set) and YouTube-VOS (the overall score of 82.7% on the validation set) with a very efficient inference time (0.16 second/frame on DAVIS 2017 validation set). Code is available at: https://github.com/liang4sx/DMN-AOA.

Keywords:
Computer science Artificial intelligence Computer vision Object (grammar) Matching (statistics) Frame (networking) Object model Feature (linguistics) Benchmark (surveying) Set (abstract data type) Segmentation Video tracking Pattern recognition (psychology) Mathematics

Metrics

25
Cited By
1.52
FWCI (Field Weighted Citation Impact)
59
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation

Jisheng DangHuicheng ZhengXiaohao XuLongguang WangQingyong HuYulan Guo

Journal:   IEEE Transactions on Neural Networks and Learning Systems Year: 2024 Vol: 36 (2)Pages: 3820-3833
JOURNAL ARTICLE

Towards Robust Video Object Segmentation with Adaptive Object Calibration

Xiaohao XuJinglu WangMing XiangYan Lu

Journal:   Proceedings of the 30th ACM International Conference on Multimedia Year: 2022 Pages: 2709-2718
JOURNAL ARTICLE

Flow Adaptive Video Object Segmentation

Fanqing LinYao ChouTony Martinez

Journal:   Image and Vision Computing Year: 2019 Vol: 94 Pages: 103864-103864
© 2026 ScienceGate Book Chapters — All rights reserved.