Towards VLM-based Hybrid Explainable Prompt Enhancement for Zero-Shot Industrial Anomaly Detection

Yuning Cui; Mingyu Liu; Wenqi Ren; Alois Knoll; Fei Yuan; Bob Zhang; Jie Wen

doi:10.24963/ijcai.2024/80

ScienceGate Book Chapters

JOURNAL ARTICLE

Towards VLM-based Hybrid Explainable Prompt Enhancement for Zero-Shot Industrial Anomaly Detection

Yuning Cui Mingyu Liu Wenqi Ren Alois Knoll Fei Yuan Bob Zhang Jie Wen

Year: 2024 Pages: 711-719

DOI: 10.24963/ijcai.2024/80

Get Full-Text PDF Get Analytical Report

Abstract

Zero-Shot Industrial Anomaly Detection (ZSIAD) aims to identify and localize anomalies in industrial images from unseen categories. Owing to the powerful generalization capabilities, Vision-Language Models (VLMs) have achieved growing interest in ZSIAD. To guide the model toward understanding and localizing the semantically complex industrial anomalies, existing VLM-based methods have attempted to provide additional prompts to the model through learnable text prompt templates. However, these zero-shot methods lack detailed descriptions of specific anomalies, making it difficult to classify and segment the diverse range of industrial anomalies accurately. To address the aforementioned issue, we firstly propose the multi-stage prompt generation agent for ZSIAD. Specifically, we leverage the Multi-modal Language Large Model (MLLM) to articulate the detailed differential information between normal and test samples, which can provide detailed text prompts to the model through further refinement and anti-false alarm constraint. Moreover, we introduce the Visual Fundamental Model (VFM) to generate anomaly-related attention prompts for more accurate localization of anomalies with varying sizes and shapes. Extensive experiments on seven real-world industrial anomaly detection datasets have shown that the proposed method not only outperforms recent SOTA methods, but also its explainable prompts provide the model with a more intuitive basis for anomaly identification.

Keywords:

Modulation (music) Computer science Frequency modulation Image restoration Image (mathematics) Telecommunications Radio frequency Computer vision Image processing Acoustics Physics

Metrics

Cited By

2.95

FWCI (Field Weighted Citation Impact)

Refs

0.87

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Optical Systems and Laser Technology

Physical Sciences → Engineering → Electrical and Electronic Engineering

Image and Signal Denoising Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Infrared Target Detection Methodologies

Physical Sciences → Engineering → Aerospace Engineering

Towards VLM-based Hybrid Explainable Prompt Enhancement for Zero-Shot Industrial Anomaly Detection

Abstract

Metrics

Citation History

Topics

Related Documents

Towards VLM-based Hybrid Explainable Prompt Enhancement for Zero-Shot Industrial Anomaly Detection

Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection

Zero-Shot Industrial Anomaly Segmentation with Image-Aware Prompt Generation

EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection

LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction