Achieving Dermatologist-level Classification Performance of Skin Lesion Images

The Dataset

The paper uses a new dermatologist-labelled dataset of 129,450 clinical images, which also includes 3,374 dermoscopic images. These images come from 18 different clinician-curated, open-access online repositories, as well as from clinical data from Stanford University Medical Center, and belong to 2,032 diseases. This data is split into 127,463 training and validation images, and 1,942 biopsy-labelled test images.

The Network Architecture

A pre-trained GoogleNet Inception v3 CNN architecture was used. The network was pre-trained on approximately 1.28 million images belonging to 1,000 object categories from the 2014 ILSVRC (ImageNet Large Scale Visual Recognition Challenge), and was trained on the aforementioned new dataset using transfer learning.

Taxonomy

The 2,032 diseases were arranged in a tree structure with three root nodes representing categorization based on the most geenral disease classes - benign lesions, malignant lesions, and non-neoplastic lesions. The tree was derived by dermatologists using a bottom-up procedure.

Data Preparation

Blurry images and “far-away” images were removed from the test and validation sets, although they were retained for training. Lesion image duplication (images corresponding to the same lesion taken from different viewpoints or multiple images of similar lesions on the same person) was avoided by using image EXIF metadata, repository specific information, and nearest neighbor image retrieval with CNN features to create an undirected graph connecting any pair of images that were estimated to be similar. Connected components of this graph were randomly assigned to either train or validation sets. The test set was comprised of images from independent, high-quality repositories of biopsy-proven images from the Stanford Hospital, the University of Edinburgh Dermofit Image Library and the ISIC Dermoscopic Archive. As a result, there was no overlap between the training/validation data and the test images.

Disease Partitioning Algorithm

A recursive algorithm was used to partition the individual diseases into training classes. The algorithm leverages the taxonomy to generate classes with individual diseases that are clinically and visually similar. Also, there is a constraint on the algorithm to keep the average generated training class size slightly lesser than the allowable ‘maxClassSize’. This algorithm therefore generates disease classes with the perfect trade-off between overly fine-grained sparsely populated and coarse abundantly populated classes. Setting the hyperparameter ‘maxClassSize’ to 1,000, the algorithm generates a disease partition of 757 classes.

Training Details

TensorFlow was used to implement the network. The final classification layer from the Inception v3 CNN was removed and retrained with this dataset. Each image was resized to 299 x 299 in accordance with the network architecture. All layers were fine-tuned using the same global learning rate of 0.001 and a decay factor of 16 every 30 epochs. RMSProp with a decay of 0.9, momentum of 0.9 and epsilon of 0.1. Image augmentation by a factor of 720 is done by randomly rotating each image between $0^\text{o}$ and $360^\text{o}$, cropping the largest inscribed rectangle from the image, and then vertically flipping it with a probability of 0.5.

Inference

Given an input image, the output of the CNN is a probability distribution over the training nodes. Therefore, the probability of any inference node is the sum of the probabilities of all its descendent (child) training nodes.

This summary was written in Fall 2018 as a part of the CMPT 880 Special Topics in AI: Medical Imaging Meets Machine Learning course.

The Dataset#

The Network Architecture#

Taxonomy#

Data Preparation#

Disease Partitioning Algorithm#

Training Details#

Inference#