Dataset: MedMNIST-C: 12 corrupted benchmark datasets and augmentation APIs for robust medical image classification

We introduce MedMNIST-C, a comprehensive robustness benchmark based on the MedMNIST+ dataset collection for medical image classification. The dataset covers 12 2D datasets and 9 imaging modalities, and provides modality-specific image corruptions at five severity levels to simulate realistic artifacts and distribution shifts encountered in medical imaging applications. In addition to the benchmark datasets themselves, MedMNIST-C includes software APIs fordata augmentation, facilitating both robustness assessment and robustness-oriented model development.

The following is a summary of the publicly available dataset and accompanying codebase. For further details, please refer to the preprint, the Zenodo release, and the README of the current repository.

Corrupted datasets derived from MedMNIST+

Benchmark datasets derived from the MedMNIST+ collection at resolution 224x224 [link]
Coverage of 12 2D datasets spanning 9 imaging modalities
Corruptions designed to reflect modality-specific imaging artifacts
Five predefined corruption severity levels for controlled robustness evaluation
Public release of the corrupted datasets via Zenodo

Corruption types

MedMNIST-C organizes corruptions into five main categories:

digital corruptions, such as JPEG compression and pixelation
noise corruptions, including Gaussian, speckle, impulse, and shot noise
blur corruptions, including Gaussian, defocus, motion, and zoom blur
color corruptions, including brightness, contrast, saturation, and gamma shifts
task-specific corruptions, including stain deposits, bubbles, black corners, and acquisition overlays

These corruptions are evaluated across five increasing severity levels, enabling systematic robustness analysis across a broad range of medical imaging tasks.

Code and APIs

The repository provides the main components required to create, use, and evaluate the benchmark:

corruption registry with predefined intensity settings
dataset manager for generating corrupted datasets
dataset loaders for the corrupted benchmarks
visualization tools for inspecting corruption effects
augmentation APIs for corruption-based training (termed targeted augmentations)
PyTorch evaluation utilities for robustness experiments
normalization baselines for model evaluation

License

Code: Apache-2.0
Dataset: CC BY 4.0, except for DermaMNIST-C: CC BY-NC 4.0

Citation

F. Di Salvo, S. Doerrich, and C. Ledig, “MedMNIST-C: Comprehensive benchmark and improved classifier robustness by simulating realistic image corruptions”, arXiv, 2024. [preprint] [code] [bib] [zenodo]

Please also cite MedMNIST, the underlying source datasets, and ImageNet-C, as recommended by the repository.