The surprisingly fluent predictive performance of LLM (Large Language Models) as well as the high-quality photo-realistic rendering of Diffusion Models has heralded a new beginning in the area of Generative AI. Such kinds of deep learning based models with billions of parameters and pre-trained on massive-scale data-sets are also called Large Foundation Models (LFM). These models not only have caught the public imagination but also have led to an unprecedented surge in interest towards the applications of these models. Instead of the previous approach of developing AI models for specific tasks, more and more researchers are developing large task-agnostic models pre-trained on massive data, which can then be adapted to a variety of downstream tasks via fine-tuning, fewshot learning, or zero-shot learning. Some examples are ChatGPT, LLaMA, GPT-4, Flamingo, MidJourney, Stable-Diffusion and DALLE. Some of them can handle text (e.g., ChatGPT, LLaMA) while some others (e.g., GPT-4 and Flamingo) can utilize multimodal data and can hence be considered Multimodal Large Foundation Models (MLFM).
Daniel TruhnJan‐Niklas EckardtDyke FerberJakob Nikolas Kather
Zheyi ChenLiuchang XuHongting ZhengLuyao ChenAmr TolbaLiang ZhaoKeping YuHailin Feng
Junlong GaoZhimeng HuangQi MaoSiwei MaChuanmin Jia
Huimin LuQingxin ZhaoZexing ZhangSongzhe MaChenglin Lin
Weide LiuXiaoyang ZhongJingwen HouShaohua LiHaozhe HuangWei ZhouYuming Fang