Pixel-Based Skin Tone Estimation on Dermoscopy: A Dual-Rater MST Benchmark and Feasibility Study
Skin-tone labels are absent from public dermoscopy benchmarks such as the International Skin Imaging Collaboration (ISIC), making it impossible to audit whether clinical AI performs equitably across skin tones. While several recent works estimate skin tone automatically from clinical photography and selfies, we ask whether this approach is feasible on dermoscopy, the primary imaging modality of these benchmarks. To answer this, we make three main contributions. First, we release MST-Derm, a dual-rater Monk Skin Tone (MST) annotation benchmark on 500 ISIC 2018 images. Raters were given an explicit unrateable option for crops where the skin surrounding the lesion was too occluded to label conf