JOURNAL ARTICLE

ModalChorus: Visual Probing and Alignment of Multi-Modal Embeddings via Modal Fusion Map

Yilin YeShishi XiaoXingchen ZengWei Zeng

Year: 2024 Journal:   IEEE Transactions on Visualization and Computer Graphics Vol: 31 (1)Pages: 294-304   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Multi-modal embeddings form the foundation for vision-language models, such as CLIP embeddings, the most widely used text-image embeddings. However, these embeddings are vulnerable to subtle misalignment of cross-modal features, resulting in decreased model performance and diminished generalization. To address this problem, we design ModalChorus, an interactive system for visual probing and alignment of multi-modal embeddings. ModalChorus primarily offers a two-stage process: 1) embedding probing with Modal Fusion Map (MFM), a novel parametric dimensionality reduction method that integrates both metric and nonmetric objectives to enhance modality fusion; and 2) embedding alignment that allows users to interactively articulate intentions for both point-set and set-set alignments. Quantitative and qualitative comparisons for CLIP embeddings with existing dimensionality reduction (e.g., t-SNE and MDS) and data fusion (e.g., data context map) methods demonstrate the advantages of MFM in showcasing cross-modal features over common vision-language datasets. Case studies reveal that ModalChorus can facilitate intuitive discovery of misalignment and efficient re-alignment in scenarios ranging from zero-shot classification to cross-modal retrieval and generation.

Keywords:
Modal Computer science Fusion Computer vision Artificial intelligence Computer graphics (images) Visualization Sensor fusion Data visualization

Metrics

2
Cited By
1.06
FWCI (Field Weighted Citation Impact)
75
Refs
0.68
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Industrial Vision Systems and Defect Detection
Physical Sciences →  Engineering →  Industrial and Manufacturing Engineering

Related Documents

BOOK-CHAPTER

Multi-modal Data Fusion based on Embeddings

Steffen Thoma

Studies on the semantic web Year: 2019
JOURNAL ARTICLE

Token Embeddings Alignment for Cross-Modal Retrieval

Chen-Wei XieJianmin WuYun ZhengPan PanXian‐Sheng Hua

Journal:   Proceedings of the 30th ACM International Conference on Multimedia Year: 2022 Pages: 4555-4563
JOURNAL ARTICLE

Multi-modal Fusion

Huaping LiuAmir HussainShuliang Wang

Journal:   Information Sciences Year: 2018 Vol: 432 Pages: 462-462
JOURNAL ARTICLE

LiDAR-BIND: Multi-Modal Sensor Fusion Through Shared Latent Embeddings

Niels BalemansAli AnwarJan SteckelSiegfried Mercelis

Journal:   IEEE Robotics and Automation Letters Year: 2024 Vol: 9 (11)Pages: 9159-9166
© 2026 ScienceGate Book Chapters — All rights reserved.