JOURNAL ARTICLE

Bidirectionally Learning Dense Spatio-temporal Feature Propagation Network for Unsupervised Video Object Segmentation

Jiaqing FanTiankang SuKaihua ZhangQingshan Liu

Year: 2022 Journal:   Proceedings of the 30th ACM International Conference on Multimedia Pages: 3646-3655

Abstract

Spatio-temporal feature representation is essential for accurate unsupervised video object segmentation, which needs an effective feature propagation paradigm for both appearance and motion features that can fully interchange information across frames. However, existing solutions mainly focus on the forward feature propagation from the preceding frame to the current one, either using the former segmentation mask or motion propagation in a frame-by-frame manner. This ignores the bi-directional temporal feature interactions (including the backward propagation from the future to the current frame) across all frames that can help to enhance the spatiotemporal feature representation for segmentation prediction. To this end, this paper presents a novel Dense Bidirectional Spatio-temporal feature propagation Network (DBSNet) to fully integrate the forward and the backward propagations across all frames. Specifically, a dense bi-ConvLSTM module is first developed to propagate the features across all frames in a forward and backward manner. This can fully capture the multi-level spatio-temporal contextual information across all frames, producing an effective feature representation that has a strong discriminative capability to tell from noisy backgrounds. Following it, a spatio-temporal Transformer refinement module is designed to further enhance the propagated features, which can effectively capture the spatio-temporal long-range dependencies among all frames. Afterwards, a Co-operative Direction-aware Graph Attention (Co-DGA) module is designed to integrate the propagated appearancemotion cues, yielding a strong spatio-temporal feature representation for segmentation mask prediction. The Co-DGA assigns proper attentional weights to neighboring points along the coordinate axis, making the segmentation model to selectively focus on the most relevant neighbors. Extensive evaluations on four mainstream challenging benchmarks including DAVIS16, FBMS, DAVSOD, and MCL demonstrate that the proposed DBSNet achieves favorable performance against state-of-the-art methods in terms of all evaluation metrics.

Keywords:
Computer science Segmentation Artificial intelligence Feature (linguistics) Feature learning Pattern recognition (psychology) Discriminative model Frame (networking) Computer vision Representation (politics) Focus (optics)

Metrics

7
Cited By
0.48
FWCI (Field Weighted Citation Impact)
39
Refs
0.71
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Enhancement Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.