JOURNAL ARTICLE

Two-stream deep encoder-decoder architecture for fully automatic video object segmentation

Abstract

We propose a two-stream Deep Encoder-Decoder architecture to tackle the task of fully automatic video object segmentation. Both two streams, i.e., ImSeg-Stream (for static image segmentation) and MoSeg-Stream (for optical flow segmentation), hold the totally same Encoder-Decoder architecture. The Encoder part generates a low-resolution mask with accurate locations and smooth boundaries, while the Decoder part refines the details of initial mask and enlarges its resolution via integrating lower-level features progressively. At last two streams learn to integrate for better results. Moreover, to handle the problem of inadequate video object segmentation datasets, we propose a seeking strategy to generate a large-scale handcrafted dataset for training. Experiments on two standard datasets demonstrate that proposed method outperforms most state-of-the-art methods in both segmentation accuracy and run time.

Keywords:
Computer science Encoder Artificial intelligence Segmentation Computer vision Image segmentation Object (grammar) Decoding methods Object detection Pattern recognition (psychology) Algorithm

Metrics

5
Cited By
0.51
FWCI (Field Weighted Citation Impact)
20
Refs
0.71
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.