Spatial-Temporal Aggregated Shuffle Attention for Video Instance Segmentation of Traffic Scene

Chongren Zhao; Yinhui Zhang; Zifen HE; Yunnan Deng; Ying Huang; Guangchen CHEN

doi:10.1587/transinf.2022edp7147

ScienceGate Book Chapters

JOURNAL ARTICLE

Spatial-Temporal Aggregated Shuffle Attention for Video Instance Segmentation of Traffic Scene

Chongren Zhao Yinhui Zhang Zifen HE Yunnan Deng Ying Huang Guangchen CHEN

Year: 2023 Journal: IEICE Transactions on Information and Systems Vol: E106.D (2)Pages: 240-251 Publisher: Institute of Electronics, Information and Communication Engineers

DOI: 10.1587/transinf.2022edp7147

Get Full-Text PDF Get Analytical Report

Abstract

Aiming at the problem of spatial focus regions distribution dispersion and dislocation in feature pyramid networks and insufficient feature dependency acquisition in both spatial and channel dimensions, this paper proposes a spatial-temporal aggregated shuffle attention for video instance segmentation (STASA-VIS). First, an mixed subsampling (MS) module to embed activating features from the low-level target area of feature pyramid into the high-level is designed, so as to aggregate spatial information on target area. Taking advantage of the coherent information in video frames, STASA-VIS uses the first ones of every 5 video frames as the key-frames and then propagates the keyframe feature maps of the pyramid layers forward in the time domain, and fuses with the non-keyframe mixed subsampled features to achieve time-domain consistent feature aggregation. Finally, STASA-VIS embeds shuffle attention in the backbone to capture the pixel-level pairwise relationship and dimensional dependencies among the channels and reduce the computation. Experimental results show that the segmentation accuracy of STASA-VIS reaches 41.2%, and the test speed reaches 34FPS, which is better than the state-of-the-art one stage video instance segmentation (VIS) methods in accuracy and achieves real-time segmentation.

Keywords:

Computer science Artificial intelligence Pyramid (geometry) Segmentation Feature (linguistics) Computer vision Focus (optics) Pattern recognition (psychology) Spatial analysis Pairwise comparison Remote sensing Geography Mathematics

Metrics

Cited By

0.36

FWCI (Field Weighted Citation Impact)

Refs

0.49

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Image Processing Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Visual Attention and Saliency Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Enhancement Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Spatial-Temporal Aggregated Shuffle Attention for Video Instance Segmentation of Traffic Scene

Abstract

Metrics

Citation History

Topics

Related Documents

Spatio-Temporal Attention Network for Video Instance Segmentation

STFormer: Spatial-Temporal-Aware Transformer for Video Instance Segmentation

Deformable VisTR: Spatio Temporal Deformable Attention for Video Instance Segmentation

Video Segmentation Based on Spatial-Temporal Attention Model

Video Instance Segmentation via Spatial Feature Enhancement and Temporal Fusion