JOURNAL ARTICLE

Video semantic segmentation using deep multi-view representation learning

Abstract

In this paper, we propose a deep learning model based on deep multi-view representation learning, to address the video object segmentation task. The proposed model emphasizes the importance of the inherent correlation between video frames and incorporates a multi-view representation learning based on deep canonically correlated autoencoders. The multi-view representation learning in our model provides an efficient mechanism for capturing inherent correlations by jointly extracting useful features and learning better representation into a joint feature space, i.e., shared representation. To increase the training data and the learning capacity, we train the proposed model with pairs of video frames, i.e., Fa and Fb. During the segmentation phase, the deep canonically correlated auto encoders model encodes useful features by processing multiple reference frames together, which is used to detect the frequently reappearing. Our model enhances the state-of-the-art deep learning-based methods that mainly focus on learning discriminative foreground representations over appearance and motion. Experimental results over two large benchmarks demonstrate the ability of the proposed method to outperform competitive approaches and to reach good performances, in terms of semantic segmentation.

Keywords:
Computer science Artificial intelligence Feature learning Segmentation Deep learning Discriminative model Representation (politics) Focus (optics) Pattern recognition (psychology) Encoder Feature (linguistics) Autoencoder Object (grammar) Multi-task learning Frame (networking) Computer vision Machine learning Task (project management)

Metrics

6
Cited By
0.51
FWCI (Field Weighted Citation Impact)
53
Refs
0.64
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Visual Attention and Saliency Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders

K. DeepakG. SrivathsanSeyedEhsan RoshanS. Chandrakala

Journal:   Circuits Systems and Signal Processing Year: 2020 Vol: 40 (3)Pages: 1333-1349
JOURNAL ARTICLE

Improving deep learning based semantic segmentation with multi view outliner correction

Peters, TorbenBrenner, ClausSong, M.

Journal:   Institutional Repository of Leibniz Universität Hannover (Leibniz Universität Hannover) Year: 2020
JOURNAL ARTICLE

IMPROVING DEEP LEARNING BASED SEMANTIC SEGMENTATION WITH MULTI VIEW OUTLIER CORRECTION

Torben PetersC. BrennerMingli Song

Journal:   ˜The œinternational archives of the photogrammetry, remote sensing and spatial information sciences/International archives of the photogrammetry, remote sensing and spatial information sciences Year: 2020 Vol: XLIII-B2-2020 Pages: 711-716
JOURNAL ARTICLE

Multi-Level Representation Learning with Semantic Alignment for Referring Video Object Segmentation

Dongming WuXingping DongLing ShaoJianbing Shen

Journal:   2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Year: 2022
© 2026 ScienceGate Book Chapters — All rights reserved.