Panel: Multimodal Large Foundation Models

Mohan Kankanhalli; Marcel Worring

doi:10.1145/3581783.3617350

ScienceGate Book Chapters

JOURNAL ARTICLE

Panel: Multimodal Large Foundation Models

Mohan Kankanhalli Marcel Worring

Year: 2023 Pages: 9709-9709

DOI: 10.1145/3581783.3617350

Get Full-Text PDF Get Analytical Report

Abstract

The surprisingly fluent predictive performance of LLM (Large Language Models) as well as the high-quality photo-realistic rendering of Diffusion Models has heralded a new beginning in the area of Generative AI. Such kinds of deep learning based models with billions of parameters and pre-trained on massive-scale data-sets are also called Large Foundation Models (LFM). These models not only have caught the public imagination but also have led to an unprecedented surge in interest towards the applications of these models. Instead of the previous approach of developing AI models for specific tasks, more and more researchers are developing large task-agnostic models pre-trained on massive data, which can then be adapted to a variety of downstream tasks via fine-tuning, fewshot learning, or zero-shot learning. Some examples are ChatGPT, LLaMA, GPT-4, Flamingo, MidJourney, Stable-Diffusion and DALLE. Some of them can handle text (e.g., ChatGPT, LLaMA) while some others (e.g., GPT-4 and Flamingo) can utilize multimodal data and can hence be considered Multimodal Large Foundation Models (MLFM).

Keywords:

Computer science Artificial intelligence Rendering (computer graphics) Deep learning Generative grammar Machine learning Task (project management) Engineering

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.14

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Panel: Multimodal Large Foundation Models

Abstract

Metrics

Topics

Related Documents

Large language models and multimodal foundation models for precision oncology

Evolution and Prospects of Foundation Models: From Large Language Models to Large Multimodal Models

Exploring Multimodal Knowledge for Image Compression via Large Foundation Models

MILD: A Multimodal Biometric Recognition Framework Integrating Large Foundation Models

Integrating large foundation models into multimodal named entity recognition with evidential fusion