Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models

Fei Pan; Sangryul Jeon; Brian Wang; Frank McKenna; Stella X. Yu

doi:10.1109/wacv57701.2024.00845

ScienceGate Book Chapters

JOURNAL ARTICLE

Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models

Fei Pan Sangryul Jeon Brian Wang Frank McKenna Stella X. Yu

Year: 2024 Pages: 8632-8641

DOI: 10.1109/wacv57701.2024.00845

Get Full-Text PDF Get Analytical Report

Abstract

Existing building recognition methods, exemplified by BRAILS, utilize supervised learning to extract information from satellite and street-view images for classification and segmentation. However, each task module requires human-annotated data, hindering the scalability and robustness to regional variations and annotation imbalances. In response, we propose a new zero-shot workflow for building attribute extraction that utilizes large-scale vision and language models to mitigate reliance on external annotations. The proposed workflow contains two key components: image-level captioning and segment-level captioning for the building images based on the vocabularies pertinent to structural and civil engineering. These two components generate descriptive captions by computing feature representations of the image and the vocabularies, and facilitating a semantic match between the visual and textual representations. Consequently, our framework offers a promising avenue to enhance AI-driven captioning for building attribute extraction in the structural and civil engineering domains, ultimately reducing reliance on human annotations while bolstering performance and adaptability.

Keywords:

Zero (linguistics) Computer science Shot (pellet) Scale (ratio) Artificial intelligence Extraction (chemistry) Computer vision Ground zero Physics Geography Linguistics Cartography

Metrics

Cited By

6.62

FWCI (Field Weighted Citation Impact)

Refs

0.93

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Geographic Information Systems Studies

Social Sciences → Social Sciences → Geography, Planning and Development

Remote-Sensing Image Classification

Physical Sciences → Engineering → Media Technology

Remote Sensing and Land Use

Physical Sciences → Earth and Planetary Sciences → Atmospheric Science

Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models

Abstract

Metrics

Citation History

Topics

Related Documents

Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models

Zero-Shot Typhoon Damage Detection Using Large Vision-Language Models

Large Language Models for Zero-Shot Semantic Web Data Extraction

Large Language Models for Zero-Shot Semantic Web Data Extraction

Zero-shot Image Caption Enhancement using Large-Scale Language Models