Dataset: MedMNIST-C: 12 corrupted benchmark datasets and augmentation APIs for robust medical image classification
We introduce MedMNIST-C, a comprehensive robustness benchmark based on the MedMNIST+ dataset collection for medical image classification. The dataset covers 12 2D datasets and 9 imaging modalities, and provides modality-specific image corruptions at five severity levels to simulate realistic artifacts and distribution shifts encountered in medical imaging applications. In addition to the benchmark datasets themselves, MedMNIST-C includes software APIs fordata augmentation, facilitating both robustness assessment and robustness-oriented model development.
The following is a summary of the publicly available dataset and accompanying codebase. For further details, please refer to the preprint, the Zenodo release, and the README of the current repository.
Corrupted datasets derived from MedMNIST+
- Benchmark datasets derived from the MedMNIST+ collection at resolution 224x224 [link]
- Coverage of 12 2D datasets spanning 9 imaging modalities
- Corruptions designed to reflect modality-specific imaging artifacts
- Five predefined corruption severity levels for controlled robustness evaluation
- Public release of the corrupted datasets via Zenodo
Corruption types
MedMNIST-C organizes corruptions into five main categories:
- digital corruptions, such as JPEG compression and pixelation
- noise corruptions, including Gaussian, speckle, impulse, and shot noise
- blur corruptions, including Gaussian, defocus, motion, and zoom blur
- color corruptions, including brightness, contrast, saturation, and gamma shifts
- task-specific corruptions, including stain deposits, bubbles, black corners, and acquisition overlays
These corruptions are evaluated across five increasing severity levels, enabling systematic robustness analysis across a broad range of medical imaging tasks.
Code and APIs
The repository provides the main components required to create, use, and evaluate the benchmark:
- corruption registry with predefined intensity settings
- dataset manager for generating corrupted datasets
- dataset loaders for the corrupted benchmarks
- visualization tools for inspecting corruption effects
- augmentation APIs for corruption-based training (termed targeted augmentations)
- PyTorch evaluation utilities for robustness experiments
- normalization baselines for model evaluation
License
- Code: Apache-2.0
- Dataset: CC BY 4.0, except for DermaMNIST-C: CC BY-NC 4.0
Citation
F. Di Salvo, S. Doerrich, and C. Ledig, “MedMNIST-C: Comprehensive benchmark and improved classifier robustness by simulating realistic image corruptions”, arXiv, 2024. [preprint] [code] [bib] [zenodo]
Please also cite MedMNIST, the underlying source datasets, and ImageNet-C, as recommended by the repository.
