Gencheng WangMeiyan YangRong Chen
Image manipulation detection plays an essential role in digital image processing. However, existing convolutional neural network (CNN)-based methods often rely on local perception, which makes it challenging to effectively capture long-range dependencies in images. This limitation results in degraded performance when detecting subtle forgery traces or handling complex backgrounds. To tackle this issue, this paper proposes a multi-stream feature-enhanced weakly-supervised image manipulation detection network, named WSMD-Net. First, we propose a stream module that leverages the global perception capability of Vision Transformers (ViT) to overcome the local perception limitations of CNN, enabling effective capture of long-range dependencies and subtle forgery traces. Second, we propose an SR-CA stream module that integrates the Steganalysis Rich Model (SRM) convolution to enhance the model’s ability to extract weak features in forgery regions, while improving stability and generalization performance. Finally, WSMD-Net enhances its capability in image manipulation detection across diverse feature dimensions by fusing multi-stream features, thereby improving detection accuracy and robustness. Experimental results demonstrate that WSMD-Net achieves superior accuracy and adaptability on four challenging public image manipulation datasets. Specifically, compared with other state-of-the-art weakly supervised methods, it improves the average image-level I-F1 by 7.4 %, and achieves consistent gains at the pixel level with 2.2 % and 1.5 % improvements in P-F1 and C-F1, respectively, highlighting its effectiveness and robustness.
Liyun DouMeng ChenJiaqing QiuJin Wang
Weizhuo ZuoBin GaoShutian LiuZhengjun Liu
Lichao SuChenwei DaiHao YuYun Chen