JOURNAL ARTICLE

MM-HiFuse: multi-modal multi-task hierarchical feature fusion for esophagus cancer staging and differentiation classification

Xiangzuo HuoShengwei TianLong YuWendong ZhangAolun LiQimeng YangJinmiao Song

Year: 2025 Journal:   Complex & Intelligent Systems Vol: 11 (1)   Publisher: Springer Science+Business Media

Abstract

Abstract Esophageal cancer is a globally significant but understudied type of cancer with high mortality rates. The staging and differentiation of esophageal cancer are crucial factors in determining the prognosis and surgical treatment plan for patients, as well as improving their chances of survival. Endoscopy and histopathological examination are considered as the gold standard for esophageal cancer diagnosis. However, some previous studies have employed deep learning-based methods for esophageal cancer analysis, which are limited to single-modal features, resulting in inadequate classification results. In response to these limitations, multi-modal learning has emerged as a promising alternative for medical image analysis tasks. In this paper, we propose a hierarchical feature fusion network, MM-HiFuse, for multi-modal multitask learning to improve the classification accuracy of esophageal cancer staging and differentiation level. The proposed architecture combines low-level to deep-level features of both pathological and endoscopic images to achieve accurate classification results. The key characteristics of MM-HiFuse include: (i) a parallel hierarchy of convolution and self-attention layers specifically designed for pathological and endoscopic image features; (ii) a multi-modal hierarchical feature fusion module (MHF) and a new multitask weighted combination loss function. The benefits of these features are the effective extraction of multi-modal representations at different semantic scales and the mutual complementarity of the multitask learning, leading to improved classification performance. Experimental results demonstrate that MM-HiFuse outperforms single-modal methods in esophageal cancer staging and differentiation classification. Our findings provide evidence for the early diagnosis and accurate staging of esophageal cancer and serve as a new inspiration for the application of multi-modal multitask learning in medical image analysis. Code is available at https://github.com/huoxiangzuo/MM-HiFuse .

Keywords:
Computational intelligence Modal Feature (linguistics) Task (project management) Computer science Artificial intelligence Cancer Pattern recognition (psychology) Medicine Engineering Internal medicine Materials science Systems engineering Linguistics

Metrics

1
Cited By
6.62
FWCI (Field Weighted Citation Impact)
57
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Esophageal Cancer Research and Treatment
Health Sciences →  Medicine →  Surgery
Lung Cancer Diagnosis and Treatment
Health Sciences →  Medicine →  Pulmonary and Respiratory Medicine
Radiomics and Machine Learning in Medical Imaging
Health Sciences →  Medicine →  Radiology, Nuclear Medicine and Imaging

Related Documents

JOURNAL ARTICLE

HiFuse: Hierarchical multi-scale feature fusion network for medical image classification

Xiangzuo HuoGang SunShengwei TianYan WangLong YuJun LongWendong ZhangAolun Li

Journal:   Biomedical Signal Processing and Control Year: 2023 Vol: 87 Pages: 105534-105534
JOURNAL ARTICLE

Multi-modal multi-task feature fusion for RGBT tracking

Yujue CaiXiubao SuiGuohua Gu

Journal:   Information Fusion Year: 2023 Vol: 97 Pages: 101816-101816
JOURNAL ARTICLE

Hierarchical multi-modal feature fusion for RGBT tracking

Na LiKai HuangZihang WangYuquan GanJinglu He

Journal:   Signal Image and Video Processing Year: 2025 Vol: 19 (13)
JOURNAL ARTICLE

Landmark Classification With Hierarchical Multi-Modal Exemplar Feature

Lei ZhuJialie ShenHai JinLiang XieRan Zheng

Journal:   IEEE Transactions on Multimedia Year: 2015 Vol: 17 (7)Pages: 981-993
© 2026 ScienceGate Book Chapters — All rights reserved.