Convolutional neural network (CNN) has shown great success in single-label image classification, but in real world images generally have multiple labels. In this paper, we utilize long short-term memory network (LSTM) as the "decoder" to generate multi labels of an image. Meanwhile, in order to reduce the `semantic gap' between the visual features and the richness of human semantics we propose a label embedding approach to generate a semantic label for an image. Experimental results demonstrate that the proposed architecture achieves a good performance on the multi-label image classification.
Lu JiangJihua YeShunjie XiaoYi ZongAiwen Jiang
Haiying ZhaoWei ZhouXiaogang HouHui Zhu
Wei ZhouZhiwu XiaPeng DouTao SuHaifeng Hu