TVNet: Multimodal medical image fusion by dual-branch network with vision transformer and one-shot aggregation

Jianguo Wang; Wenran Jia; Yuxing Liu; Pengfei Wu; Peng Geng; Xuguang Meng

doi:10.1177/00368504251375188

ScienceGate Book Chapters

JOURNAL ARTICLE

TVNet: Multimodal medical image fusion by dual-branch network with vision transformer and one-shot aggregation

Jianguo Wang Wenran Jia Yuxing Liu Pengfei Wu Peng Geng Xuguang Meng

Year: 2025 Journal: Science Progress Vol: 108 (4)Pages: 368504251375188-368504251375188 Publisher: SAGE Publishing

DOI: 10.1177/00368504251375188

Get Full-Text PDF Get Analytical Report

Abstract

The task of medical image fusion involves synthesizing complementary information from different modal medical images, which is of very significant for clinical diagnosis. The existing medical image fusion algorithms overly rely on convolution operations and cannot establish long-range dependencies on the source images. This can lead to edge blurring and loss of details in the fused images. Because the Transformer can effectively model long-range dependencies through self-attention, a novel and effective dual-branch feature enhancement network called TVNet is proposed to fuse multimodal medical images. This network combines Vision Transformer and Convolutional Neural Network to extract global context information and local information to preserve detailed textures and highlight the structural characteristics in source images. Furthermore, to extract the multiscale information of images, an enhancement module is used to obtain multiscale characterization information, and the two branches information are efficiently aggregated at the same time. In addition, a hybrid loss function is designed to optimize the fusion results at three levels of structure, feature, and gradient. Experiment results prove that the performance of the proposed fusion network outperforms seven state-of-the-art methods in both subjective visual effects and objective metrics. Our code is available at https://github.com/sineagles/TVNet.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

TVNet: Multimodal medical image fusion by dual-branch network with vision transformer and one-shot aggregation

Abstract

Metrics

Topics

Related Documents

RFTNet: Region–Attention Fusion Network Combined with Dual-Branch Vision Transformer for Multimodal Brain Tumor Image Segmentation

Multimodal Brain Tumor Segmentation Using a Dual-Branch Vision Transformer Integrated with Region-Attention Fusion Network

Dual-branch Feature Fusion Vision Transformer for Garbage Image Classification

MDA-ViT: Multimodal image fusion using dual attention vision transformer

Lightweight multi-branch feature fusion network for multimodal medical image fusion