Jianwei ZhangFeng De-wangZilin WuHengwei Liu
The impact of garbage on human health, the environment and economic development is self-evident. Today, the rapid development and wide application of deep learning technology in the field of vision has enabled the idea of automatic classification of garbage through AI. Most of the previous garbage image classification work is achieved by CNN-based methods. However, due to the limitation of receptive fields, these methods cannot fully use contextual information to capture features. This is not conducive to distinguishing garbage images with similar features, such as cups with different materials of the same shape and color. To solve this problem, we used vision transformer (ViT) as a backbone network combined with fine-grained classification to classify garbage images. In our work, we propose a novel pure transformer-based framework Dual-branch Feature Fusion Vision Transformer (DBFF). We designed a dual-branched network structure (DBN) to solve the loss of key feature information when the input image is divided into patches and propose the hierarchical feature extraction module (HFE), which is used to extract important features in each layer to compensate for the output bottom and middle level feature information. We tested on three junk datasets to verify the effectiveness of DBFF. The experimental results show that the performance improved compared to ViT and better than most current convolutional neural networks.
Xiaohui HeNan YangPanle LiMengjia QiaoXijie ChengLiyang ZhangJiandong Shang
Lanxue DangLibo WengYane HouXianyu ZuoYang Liu
Meitong LiuFei YuZhenya DiaoZheming HuangHongrun WuYingpin Chen
Saravanan ElumalaiSurendran RajendranMajdi Khalid