In this paper, we propose a framework for image classification tasks, named MIC, that takes as input multi-view images, such as RGB-T images for surveillance purposes. We combine auto-encoder and generative adversarial network architectures to ensure the multi-view embedding in a common latent space. Then, the resulting features are fed to the classification stage. The proposed framework is able to, all at once, train the multi-view embedding model to find a shared latent representation for the different views, perform data imputation (generate the missing views) and ensure the classification task by predicting the labels. Experiments on the MNIST dataset with a panoply of classifiers and several missingness ratios show the effectiveness of our solution.
Chao ShangAaron PalmerJiangwen SunKo‐Shin ChenJin LüJinbo Bi
Gagan KanojiaShanmuganathan Raman
Xiwen QinHongyu ShiXiaogang DongSiqi ZhangLiping Yuan
Priyanshi KhareRajesh WadhvaniSanyam Shukla
Jaeyoon KimDonghyun TaeJunhee Seok