A Weakly Supervised Multi-Stream Feature Enhancement Network for Image Manipulation Detection and Localization

Gencheng Wang; Meiyan Yang; Rong Chen

doi:10.1109/access.2025.3607317

ScienceGate Book Chapters

JOURNAL ARTICLE

A Weakly Supervised Multi-Stream Feature Enhancement Network for Image Manipulation Detection and Localization

Gencheng Wang Meiyan Yang Rong Chen

Year: 2025 Journal: IEEE Access Vol: 13 Pages: 157111-157125 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2025.3607317

Get Full-Text PDF Get Analytical Report

Abstract

Image manipulation detection plays an essential role in digital image processing. However, existing convolutional neural network (CNN)-based methods often rely on local perception, which makes it challenging to effectively capture long-range dependencies in images. This limitation results in degraded performance when detecting subtle forgery traces or handling complex backgrounds. To tackle this issue, this paper proposes a multi-stream feature-enhanced weakly-supervised image manipulation detection network, named WSMD-Net. First, we propose a stream module that leverages the global perception capability of Vision Transformers (ViT) to overcome the local perception limitations of CNN, enabling effective capture of long-range dependencies and subtle forgery traces. Second, we propose an SR-CA stream module that integrates the Steganalysis Rich Model (SRM) convolution to enhance the model’s ability to extract weak features in forgery regions, while improving stability and generalization performance. Finally, WSMD-Net enhances its capability in image manipulation detection across diverse feature dimensions by fusing multi-stream features, thereby improving detection accuracy and robustness. Experimental results demonstrate that WSMD-Net achieves superior accuracy and adaptability on four challenging public image manipulation datasets. Specifically, compared with other state-of-the-art weakly supervised methods, it improves the average image-level I-F1 by 7.4 %, and achieves consistent gains at the pixel level with 2.2 % and 1.5 % improvements in P-F1 and C-F1, respectively, highlighting its effectiveness and robustness.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.47

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Image Processing Techniques and Applications

Physical Sciences → Engineering → Media Technology

Digital Media Forensic Detection

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image Processing Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

A Weakly Supervised Multi-Stream Feature Enhancement Network for Image Manipulation Detection and Localization

Abstract

Metrics

Topics

Related Documents

Weakly-supervised cross-contrastive learning network for image manipulation detection and localization

MSF-Net: Multi-stream fusion network for image manipulation detection and localization

Temporal Feature Enhancement Dilated Convolution Network for Weakly-supervised Temporal Action Localization

Multi-Scale Supervised Spatio-Channel Aggregation Network for Image Manipulation Detection and Localization

IFE-Net: Integrated feature enhancement network for image manipulation localization