TransDSSL: Transformer Based Depth Estimation via Self-Supervised Learning

Daechan Han; Jeongmin Shin; Namil Kim; Soonmin Hwang; Yukyung Choi

doi:10.1109/lra.2022.3196781

ScienceGate Book Chapters

JOURNAL ARTICLE

TransDSSL: Transformer Based Depth Estimation via Self-Supervised Learning

Daechan Han Jeongmin Shin Namil Kim Soonmin Hwang Yukyung Choi

Year: 2022 Journal: IEEE Robotics and Automation Letters Vol: 7 (4)Pages: 10969-10976 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/lra.2022.3196781

Get Full-Text PDF Get Analytical Report

Abstract

Recently, transformers have been widely adopted for various computer vision tasks and show promising results due to their ability to encode long-range spatial dependencies in an image effectively. However, very few studies on adopting transformers in self-supervised depth estimation have been conducted. When replacing the CNN architecture with the transformer in self-supervised learning of depth, we encounter several problems such as problematic multi-scale photometric loss function when used with transformers and, insufficient ability to capture local details. In this letter, we propose an attention-based decoder module, Pixel-Wise Skip Attention (PWSA), to enhance fine details in feature maps while keeping global context from transformers. In addition, we propose utilizing self-distillation loss with single-scale photometric loss to alleviate the instability of transformer training by using correct training signals. We demonstrate that the proposed model performs accurate predictions on large objects and thin structures that require global context and local details. Our model achieves state-of-the-art performance among the self-supervised monocular depth estimation methods on KITTI and DDAD benchmarks.

Keywords:

Computer science Transformer Artificial intelligence Monocular Pixel Machine learning Pattern recognition (psychology) Computer vision Engineering Voltage

Metrics

Cited By

4.09

FWCI (Field Weighted Citation Impact)

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Processing Techniques and Applications

Physical Sciences → Engineering → Media Technology

Optical measurement and interference techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

TransDSSL: Transformer Based Depth Estimation via Self-Supervised Learning

Abstract

Metrics

Citation History

Topics

Related Documents

TransIndoor: Transformer Based Self-supervised Indoor Depth Estimation

TinyDepth: Lightweight self-supervised monocular depth estimation based on transformer

Hybrid Transformer Based Feature Fusion for Self-Supervised Monocular Depth Estimation

Self-Supervised Monocular Depth Estimation Using Hybrid Transformer Encoder

Transformer-based Models for Supervised Monocular Depth Estimation