Time-Space Transformers for Video Panoptic Segmentation

Andra Petrovai; Sergiu Nedevschi

doi:10.1109/wacv51458.2022.00270

ScienceGate Book Chapters

JOURNAL ARTICLE

Time-Space Transformers for Video Panoptic Segmentation

Andra Petrovai Sergiu Nedevschi

Year: 2022 Journal: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

DOI: 10.1109/wacv51458.2022.00270

Get Full-Text PDF Get Analytical Report

Abstract

We propose a novel solution for the task of video panoptic segmentation, that\nsimultaneously predicts pixel-level semantic and instance segmentation and\ngenerates clip-level instance tracks. Our network, named VPS-Transformer, with\na hybrid architecture based on the state-of-the-art panoptic segmentation\nnetwork Panoptic-DeepLab, combines a convolutional architecture for\nsingle-frame panoptic segmentation and a novel video module based on an\ninstantiation of the pure Transformer block. The Transformer, equipped with\nattention mechanisms, models spatio-temporal relations between backbone output\nfeatures of current and past frames for more accurate and consistent panoptic\nestimates. As the pure Transformer block introduces large computation overhead\nwhen processing high resolution images, we propose a few design changes for a\nmore efficient compute. We study how to aggregate information more effectively\nover the space-time volume and we compare several variants of the Transformer\nblock with different attention schemes. Extensive experiments on the\nCityscapes-VPS dataset demonstrate that our best model improves the temporal\nconsistency and video panoptic quality by a margin of 2.2%, with little extra\ncomputation.\n

Keywords:

Computer science Segmentation Transformer Artificial intelligence Computer vision Panopticon Pixel Computation Algorithm Engineering

Metrics

Cited By

0.28

FWCI (Field Weighted Citation Impact)

Refs

0.56

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Image Processing Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Vision and Imaging

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Time-Space Transformers for Video Panoptic Segmentation

Abstract

Metrics

Citation History

Topics

Related Documents

Video Panoptic Segmentation

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Panoptic Segmentation using Mask2Former with Swin Transformers

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

SP2Mask4D: Efficient 4D Panoptic Segmentation Using Superpoint Transformers