Ambiguous Medical Image Segmentation using Diffusion Models
Aimon Rahman1
Jeya Maria Jose Valanarasu1
Ilker Hacihaliloglu2
Vishal M. Patel1

1Johns Hopkins University
2University of British Columbia

[CVPR 2023]


*These AI radiologists do not exist. Generated by Stable diffusion for visualization purposes.


Collective insights from a group of experts have always proven to outperform an individual's best diagnostic for clinical tasks. For the task of medical image segmentation, existing research on AI-based alternatives focuses more on developing models that can imitate the best individual rather than harnessing the power of expert groups. In this paper, we introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights. Our proposed model generates a distribution of segmentation masks by leveraging the inherent stochastic sampling process of diffusion using only minimal additional learning. We demonstrate on three different medical image modalities- CT, ultrasound, and MRI that our model is capable of producing several possible variants while capturing the frequencies of their occurrences. Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks in terms of accuracy while preserving naturally occurring variation. We also propose a new metric to evaluate the diversity as well as the accuracy of segmentation predictions that aligns with the interest of clinical practice of collective insights.

a) Deterministic networks produce a single output for an input image. b) c-VAE-based methods encode prior information about the input image in a separate network and sample latent variables from there and inject it into the deterministic segmentation network to produce stochastic segmentation masks. c) In our method the diffusion model learns the latent structure of the segmentation as well as the ambiguity of the dataset by modeling the way input images are diffused through the latent space. Hence our method does not need an additional prior encoder to provide latent variables for multiple plausible annotations.


The proposed technique incorporates two components, namely the Axial Aligned Gaussian Encoder-based AMN (Ambiguity Modeling Network) and ACN (Ambiguity Controlling Network), in each stage of the diffusion process. The AMN takes into account both the input image and its corresponding ground truth mask distribution, while the ACN takes in the input image and the predicted distribution at each stage. By minimizing the Kullback-Leibler divergence between these two distributions, the network is trained to generate multiple outputs from a single input.


Paper and Supplementary Material

Aimon Rahman, Jeya Maria Jose Valanarasu, Ilker Hacihaliloglu, Vishal M. Patel
Ambiguous Medical Image Segmentation using Diffusion Models.
CVPR, 2023.
(hosted on ArXiv)



The content and images provided on this project website are not intended to replace professional medical or health advice. They are offered in good faith for general informational, research, and educational purposes only. The authors do not guarantee the accuracy, validity, reliability, or completeness of any information on the site. Therefore, before making any decisions or taking any actions based on the information provided, we strongly advise you to consult with qualified medical professionals. The use or reliance on any information presented on the site is solely at your own risk.

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here