CTransCNN: Combining Transformer and CNN in Multi-Label
Medical Image Classification

Shengke Li 1
Shihan Qiu1
Qichao Liu1
Yuangang Ma1
Shuangsheng Zhang3
1Wuyi University   2Victoria University   2Jiangmen Central Hospital

🎉🎉🎉 Accepted by KBS 2023 🎉🎉🎉


✨ Highlights


  • Parallel Hybrid Architecture: We introduce a novel parallel hybrid architecture that combines both CNN and transformer.
  • Cross-Branch Interaction via IIM: We incorporate the modules enable cross-branch communication and facilitate the exploration of correlations between labels.
  • Comprehensive Evaluation and Performance: We extensively evaluated on three distinct datasets: ChestX-ray11, NIH ChestX-ray14, and our in-house TCMTD.


Abstract

Multilabel image classification aims to assign images to multiple possible labels. In this task, each image may be associated with multiple labels, making it more challenging than the single-label classification problems. For instance, convolutional neural networks (CNNs) have not met the performance requirement in utilizing statistical dependencies between labels in this study. Additionally, data imbalance is a common problem in machine learning that needs to be considered for multilabel medical image classification. Furthermore, the concatenation of a CNN and a transformer suffers from the disadvantage of lacking direct interaction and information exchange between the two models. To address these issues, we propose a novel hybrid deep learning model called CTransCNN.This model comprises three main components in both the CNN and transformer branches: a multilabel multihead attention enhanced feature module (MMAEF), a multibranch residual module (MBR), and an information interaction module (IIM). The MMAEF enables the exploration of implicit correlations between labels, the MBR facilitates model optimization, and the IIM enhances feature transmission and increases nonlinearity between the two branches to help accomplish the multilabel medical image classification task. We evaluated our approach using publicly available datasets, namely the ChestX-ray11 and NIH ChestX-ray14, along with our self-constructed traditional Chinese medicine tongue dataset (TCMTD) . Extensive multilable image classification experiments were conducted comparing our approach with excellent methods. The experimental results demonstrate that the framework we have developed exhibits strong competitiveness compared to previous research. Its robust generalization ability makes it applicable to other medical multilabel image classification tasks.


Methodology


method

The proposed approach for multilabel medical image classification consists of three main stages, The first stage is to extract the initial features using the Conv module and then send two copies of them to the transformer branch and the CNN branch, respectively. In the second stage, the transformer branch adopts the label embedding and the MSS block of the MMAEF, while the CNN branch utilizes the MBR with nested inner and outer branches. The stacking of the MMAEF and MBR is equal to the number of layers in a vanilla transformer, denoted as. We believe that our model's number of layers is on par with the original architecture, which enhances structural reusability. Additionally, the widely recognized ViT model employs the transformer architecture for computer vision tasks and also utilizes a foundational version with 12 layers. Meanwhile, the IIM consists of the C2T and the T2C components to progressively fuse the feature maps in an interactive manner. Finally, after obtaining T features and >C features from the two branches, we investigate three fusion methods for their classification: direct addition of the branch scores, w eighted addition of the branch scores (with weight coefficients ranging from 0 to 1), and classification based on the concatenation of the final feature maps from the two branches.


Demo



More Results


method

method

method

method

method

method

Citation

@article{WU2023111030,
  title = {CTransCNN: Combining transformer and CNN in multilabel medical image classification},
  journal = {Knowledge-Based Systems},
  pages = {111030},
  year = {2023},
  issn = {0950-7051},
  doi = {https://doi.org/10.1016/j.knosys.2023.111030},
  url = {https://www.sciencedirect.com/science/article/pii/S0950705123007803},
  author = {Xin Wu and Yue Feng and Hong Xu and Zhuosheng Lin and Tao Chen and Shengke Li and Shihan Qiu and Qichao Liu and Yuangang Ma and Shuangsheng Zhang}
}