JOURNAL ARTICLE

TVNet: Multimodal medical image fusion by dual-branch network with vision transformer and one-shot aggregation

Jianguo WangWenran JiaYuxing LiuPengfei WuPeng GengXuguang Meng

Year: 2025 Journal:   Science Progress Vol: 108 (4)Pages: 368504251375188-368504251375188   Publisher: SAGE Publishing

Abstract

The task of medical image fusion involves synthesizing complementary information from different modal medical images, which is of very significant for clinical diagnosis. The existing medical image fusion algorithms overly rely on convolution operations and cannot establish long-range dependencies on the source images. This can lead to edge blurring and loss of details in the fused images. Because the Transformer can effectively model long-range dependencies through self-attention, a novel and effective dual-branch feature enhancement network called TVNet is proposed to fuse multimodal medical images. This network combines Vision Transformer and Convolutional Neural Network to extract global context information and local information to preserve detailed textures and highlight the structural characteristics in source images. Furthermore, to extract the multiscale information of images, an enhancement module is used to obtain multiscale characterization information, and the two branches information are efficiently aggregated at the same time. In addition, a hybrid loss function is designed to optimize the fusion results at three levels of structure, feature, and gradient. Experiment results prove that the performance of the proposed fusion network outperforms seven state-of-the-art methods in both subjective visual effects and objective metrics. Our code is available at https://github.com/sineagles/TVNet.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
30
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

© 2026 ScienceGate Book Chapters — All rights reserved.