Medical Image Analysis

Medical Image Computing

nnFormer: Volumetric Medical Image Segmentation via a 3D Transformer

Hong-Yu Zhou, Jiansen Guo, Yinghao Zhang, Xiaoguang Han, Lequan Yu, Liansheng Wang, and Yizhou Yu
IEEE Transactions on Image Processing (TIP), Vol 32, 2023 (Code and Model Release)

Transformer, the model of choice for natural language processing, has drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks to learn more contextualized visual representations. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations. To address this issue, we introduce nnFormer (i.e., not-another transFormer), a 3D transformer for volumetric medical image segmentation. nnFormer not only exploits the combination of interleaved convolution and self-attention operations, but also introduces local and global volume-based self-attention mechanism to learn volume representations. Moreover, nnFormer proposes to use skip attention to replace the traditional concatenation/summation operations in skip connections in U-Net like architecture. Experiments show that nnFormer significantly outperforms previous transformer-based counterparts by large margins on three public datasets. Compared to nnUNet, the most widely recognized convnet-based 3D medical segmentation model, nnFormer produces significantly lower HD95 and is much more computationally efficient. Furthermore, we show that nnFormer and nnUNet are highly complementary to each other in model ensembling. Codes and models of nnFormer are available at

Advancing Radiograph Representation Learning with Masked Record Modeling

Hong-Yu Zhou, Chenyu Lian, Liansheng Wang, and Yizhou Yu
International Conference on Learning Representations (ICLR), 2023 (Code and Model Release)

Modern studies in radiograph representation learning (R2L) rely on either self-supervision to encode invariant semantics or associated radiology reports to incorporate medical expertise, while the complementarity between them is barely noticed. To explore this, we formulate the self- and report-completion as two complementary objectives and present a unified framework based on masked record modeling (MRM). In practice, MRM reconstructs masked image patches and masked report tokens following a multi-task scheme to learn knowledge-enhanced semantic representations. With MRM pre-training, we obtain pre-trained models that can be well transferred to various radiography tasks. Specifically, we find that MRM offers superior performance in label-efficient fine-tuning. For instance, MRM achieves 88.5% mean AUC on CheXpert using 1% labeled data, outperforming previous R2L methods with 100% labels. On NIH ChestX-ray, MRM outperforms the best performing counterpart by about 3% under small labeling ratios. Besides, MRM surpasses self- and report-supervised pre-training in identifying the pneumonia type and the pneumothorax area, sometimes by large margins. Code and models are available at

Act Like a Radiologist: Towards Reliable Multi-view Correspondence Reasoning for Mammogram Mass Detection

Yuhang Liu#, Fandong Zhang#, Chaoqi Chen, Siwen Wang, Yizhou Wang, and Yizhou Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol 44, No 10, 2022

Mammogram mass detection is crucial for diagnosing and preventing the breast cancers in clinical practice. The complementary effect of multi-view mammogram images provides valuable information about the breast anatomical prior structure and is of great significance in digital mammography interpretation. However, unlike radiologists who can utilize the natural reasoning ability to identify masses based on multiple mammographic views, how to endow the existing object detection models with the capability of multi-view reasoning is vital for decision-making in clinical diagnosis but remains the boundary to explore. In this paper, we propose an Anatomy-aware Graph convolutional Network (AGN), which is tailored for mammogram mass detection and endows existing detection methods with multi-view reasoning ability. The proposed AGN consists of three steps. Firstly, we introduce a Bipartite Graph convolutional Network (BGN) to model the intrinsic geometric and semantic relations of ipsilateral views. Secondly, considering that the visual asymmetry of bilateral views is widely adopted in clinical practice to assist the diagnosis of breast lesions, we propose an Inception Graph convolutional Network (IGN) to model the structural similarities of bilateral views. Finally, based on the constructed graphs, the multi-view information is propagated through nodes methodically, which equips the features learned from the examined view with multi-view reasoning ability. Experiments on two standard benchmarks reveal that AGN significantly exceeds the state-of-the-art performance. Visualization results show that AGN provides interpretable visual cues for clinical diagnosis.

Diagnose Like a Radiologist: Hybrid Neuro-Probabilistic Reasoning for Attribute-Based Medical Image Diagnosis

Gangming Zhao, Quanlong Feng, Chaoqi Chen, Zhen Zhou, and Yizhou Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol 44, No 11, 2022

During clinical practice, radiologists often use attributes, e.g. morphological and appearance characteristics of a lesion, to aid disease diagnosis. Effectively modeling attributes as well as all relationships involving attributes could boost the generalization ability and verifiability of medical image diagnosis algorithms. In this paper, we introduce a hybrid neuro-probabilistic reasoning algorithm for verifiable attribute-based medical image diagnosis. There are two parallel branches in our hybrid algorithm, a Bayesian network branch performing probabilistic causal relationship reasoning and a graph convolutional network branch performing more generic relational modeling and reasoning using a feature representation. Tight coupling between these two branches is achieved via a cross-network attention mechanism and the fusion of their classification results. We have successfully applied our hybrid reasoning algorithm to two challenging medical image diagnosis tasks. On the LIDC-IDRI benchmark dataset for benign-malignant classification of pulmonary nodules in CT images, our method achieves a new state-of-the-art accuracy of 95.36% and an AUC of 96.54%. Our method also achieves a 3.24% accuracy improvement on an in-house chest X-ray image dataset for tuberculosis diagnosis. Our ablation study indicates that our hybrid algorithm achieves a much better generalization performance than a pure neural network architecture under very limited training data.

Development and validation of an abnormality-derived deep-learning diagnostic system for major respiratory diseases

Chengdi Wang, Jiechao Ma, Shu Zhang, Jun Shao, Yanyan Wang, Hong-Yu Zhou, Lujia Song, Jie Zheng, Yizhou Yu, and Weimin Li
NATURE PJ Digital Medicine, Vol 5, Article 124, 2022

Respiratory diseases impose a tremendous global health burden on large patient populations. In this study, we aimed to develop DeepMRD, a deep learning-based medical image interpretation system for the diagnosis of major respiratory diseases based on the automated identification of a wide range of radiological abnormalities through computed tomography (CT) and chest X-ray (CXR) from real-world, large-scale datasets. DeepMRD comprises four networks (two CT-Nets and two CXR-Nets) that exploit contrastive learning to generate pre-training parameters that are fine-tuned on the retrospective dataset collected from a single institution. The performance of DeepMRD was evaluated for abnormality identification and disease diagnosis on data from two different institutions: one was an internal testing dataset from the same institution as the training data and the second was collected from an external institution to evaluate the model generalizability and robustness to an unrelated population dataset. In such a difficult multi-class diagnosis task, our system achieved the average area under the receiver operating characteristic curve (AUC) of 0.856 (95% confidence interval (CI):0.843每0.868) and 0.841 (95%CI:0.832每0.887) for abnormality identification, and 0.900 (95%CI:0.872每0.958) and 0.866 (95%CI:0.832每0.887) for major respiratory diseases* diagnosis on CT and CXR datasets, respectively. Furthermore, to achieve a clinically actionable diagnosis, we deployed a preliminary version of DeepMRD into the clinical workflow, which was performed on par with senior experts in disease diagnosis, with an AUC of 0.890 and a Cohen*s k of 0.746每0.877 at a reasonable timescale; these findings demonstrate the potential to accelerate the medical workflow to facilitate early diagnosis as a triage tool for respiratory diseases which supports improved clinical diagnoses and decision-making.

M3Net: A Multi-Scale Multi-View Framework for Multi-Phase Pancreas Segmentation Based on Cross-Phase Non-Local Attention

T Qu, X Wang, C Fang, L Mao, J Li, P Li, J Qu, X Li, H Xue, Y Yu, and Z Jin
Medical Image Analysis (MIA), Vol 75, Article 102232, 2022

The complementation of arterial and venous phases visual information of CTs can help better distinguish the pancreas from its surrounding structures. However, the exploration of cross-phase contextual information is still under research in computer-aided pancreas segmentation. This paper presents MNet, a framework that integrates multi-scale multi-view information for multi-phase pancreas segmentation. The core of M3Net is built upon a dual-path network in which individual branches are set up for two phases. Cross-phase interactive connections bridging the two branches are introduced to interleave and integrate dual-phase complementary visual information. Besides, we further devise two types of non-local attention modules to enhance the high-level feature representation across phases. First, we design a location attention module to generate cross-phase reliable feature correlations to suppress the misalignment regions. Second, the depth-wise attention module is used to capture the channel dependencies and then strengthen feature representations. The experiment data consists of 224 internal CTs (106 normal and 118 abnormal) with 1 mm slice thickness, and 66 external CTs (29 normal and 37 abnormal) with 5 mm slice thickness. We achieve new state-of-the-art performance with average DSC of 91.19% on internal data, and promising result with average DSC of 86.34% on external data.

SSMD: Semi-Supervised Medical Image Detection with Adaptive Consistency and Heterogeneous Perturbation

H-Y Zhou, C Wang, H Li, G Wang, S Zhang, W Li, and Y Yu
Medical Image Analysis (MIA), Vol 72, Article 102117, 2021

Semi-supervised classification and segmentation methods have been widely investigated in medical image analysis. Both approaches can improve the performance of fully-supervised methods with additional unlabeled data. However, as a fundamental task, semi-supervised object detection has not gained enough attention in the field of medical image analysis. In this paper, we propose a novel Semi-Supervised Medical image Detector (SSMD). The motivation behind SSMD is to provide free yet effective supervision for unlabeled data, by regularizing the predictions at each position to be consistent. To achieve the above idea, we develop a novel adaptive consistency cost function to regularize different components in the predictions. Moreover, we introduce heterogeneous perturbation strategies that work in both feature space and image space, so that the proposed detector is promising to produce powerful image representations and robust predictions. Extensive experimental results show that the proposed SSMD achieves the state-of-the-art performance at a wide range of settings. We also demonstrate the strength of each proposed module with comprehensive ablation studies.

Preservational Learning Improves Self-supervised Medical Image Models by Reconstructing Diverse Contexts

Hong-Yu Zhou, Chixiang Lu, Sibei Yang, Xiaoguang Han, and Yizhou Yu
IEEE International Conference on Computer Vision (ICCV), 2021

Codes are available at

Preserving maximal information is one of principles of designing self-supervised learning methodologies. To reach this goal, contrastive learning adopts an implicit way which is contrasting image pairs. However, we believe it is not fully optimal to simply use the contrastive estimation for preservation. Moreover, it is necessary and complemental to introduce an explicit solution to preserve more information. From this perspective, we introduce Preservational Learning to reconstruct diverse image contexts in order to preserve more information in learned representations. Together with the contrastive loss, we present Preservational Contrastive Representation Learning (PCRL) for learning self-supervised medical representations. PCRL provides very competitive results under the pretraining-finetuning protocol, outperforming both self-supervised and supervised counterparts in 5 classification/segmentation tasks substantially.

A Structure-Aware Relation Network for Thoracic Diseases Detection and Segmentation

Jie Lian, Jingyu Liu, Shu Zhang, Kai Gao, Xiaoqing Liu, Dingwen Zhang, and Yizhou Yu
IEEE Transactions on Medical Imaging (TMI), Vol 40, No 8, 2021

Instance level detection and segmentation of thoracic diseases or abnormalities are crucial for automatic diagnosis in chest X-ray images. Leveraging on constant structure and disease relations extracted from domain knowledge, we propose a structure-aware relation network (SAR-Net extending Mask R-CNN. The SAR-Net consists of three relation modules: 1. the anatomical structure relation module encoding spatial relations between diseases and anatomical parts. 2. the contextual relation module aggregating clues based on query-key pair of disease RoI and lung fields. 3. the disease relation module propagating co-occurrence and causal relations into disease proposals. Towards making a practical system, we also provide ChestX-Det, a chest X-Ray dataset with instance-level annotations (boxes and masks). ChestX-Det is a subset of the public dataset NIH ChestX-ray14. It contains ~3500 images of 13 common disease categories labeled by three board-certified radiologists. We evaluate our SAR-Net on it and another dataset DR-Private. Experimental results show that it can enhance the strong baseline of Mask R-CNN with significant improvements.

Contralaterally Enhanced Networks for Thoracic Disease Detection

Gangming Zhao#, Chaowei Fang#, Guanbin Li, Licheng Jiao, and Yizhou Yu
IEEE Transactions on Medical Imaging (TMI), Vol 40, No 9, 2021

Identifying and locating diseases in chest X-rays are very challenging, due to the low visual contrast between normal and abnormal regions, and distortions caused by other overlapping tissues. An interesting phenomenon is that there exist many similar structures in the left and right parts of the chest, such as ribs, lung fields and bronchial tubes. This kind of similarities can be used to identify diseases in chest X-rays, according to the experience of broad-certificated radiologists. Aimed at improving the performance of existing detection methods, we propose a deep end-to-end module to exploit the contralateral context information for enhancing feature representations of disease proposals. First of all, under the guidance of the spine line, the spatial transformer network is employed to extract local contralateral patches, which can provide valuable context information for disease proposals. Then, we build up a specific module, based on both additive and subtractive operations, to fuse the features of the disease proposal and the contralateral patch. Our method can be integrated into both fully and weakly supervised disease detection frameworks. It achieves 33.17 AP50 on a carefully annotated chest X-ray dataset which contains 31,000 images. Experiments on the NIH chest X-ray dataset indicate that our method achieves state-of-the-art performance in weakly-supervised disease localization.