Dynamic Anchor Box-based Instance Decoding and Position-aware Instance Association for Online Video Instance Segmentation

Hyun-Jin Chun; Incheol Kim

doi:10.5302/j.icros.2023.23.0086

ScienceGate Book Chapters

JOURNAL ARTICLE

Dynamic Anchor Box-based Instance Decoding and Position-aware Instance Association for Online Video Instance Segmentation

Hyun-Jin Chun Incheol Kim

Year: 2023 Journal: Journal of Institute of Control Robotics and Systems Vol: 29 (9)Pages: 755-766

DOI: 10.5302/j.icros.2023.23.0086

Get Full-Text PDF Get Analytical Report

Abstract

Video instance segmentation (VIS) is a vision task that involves simultaneously detecting, classifying, segmenting, and tracking object instances in videos. In this study, we introduce dynamic anchor box and deformable attention for VIS (DAB-D-VIS), a novel transformer-based model for online VIS. To enhance the multilayer transformer-based instance decoding for each video frame, our proposed model uses deformable attention mechanisms that focus on a small set of key sampling points. Additionally, dynamic anchor boxes are employed to explicitly represent the region of candidate instances. These two methods have already been proven to be effective for transformer-based object detection from images. Furthermore, to address the constraints of online VIS, our model incorporates a robust inter-frame instance association method. This method leverages both similarity in the contrastive embedding space and positional difference in the images between two instances. Extensive experiments conducted on the YouTube-VIS benchmark dataset validate the effectiveness of our proposed DAB-D-VIS model.

Keywords:

Computer science Artificial intelligence Segmentation Computer vision Embedding Benchmark (surveying) Video tracking Decoding methods Transformer Pattern recognition (psychology) Object (grammar) Algorithm

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.12

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Retrieval and Classification Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Dynamic Anchor Box-based Instance Decoding and Position-aware Instance Association for Online Video Instance Segmentation

Abstract

Metrics

Topics

Related Documents

Hybrid Instance-Aware Temporal Fusion for Online Video Instance Segmentation

Instance-Aware Diffusion Implicit Process for Box-Based Instance Segmentation

SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation

Video Instance Segmentation

ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution