The paper proposes using Generative Adversarial Networks (GANs) to augment the dataset with high quality synthetic liver lesion images in order to improve the CNN classification performance for medical image classification. The authors use limited dataset of computed tomography (CT) images of 182 liver lesions (53 cysts, 64 metastases and 65 hemangiomas). The liver lesions vary considerably in shape, contrast and size, and also present intra-class variability.
The inputs to the classification model are the ROIs of lesions cropped from CT scans using the radiologist’s annotations. Since the ROIs were extracted to capture the lesion and its surrounding tissue relative to its size, the ROIs varied in size.
The classification CNN takes fized size ROIs of $64 \times 64$ pixels, whose intensities are scaled to the range of $[0,1]$. The architecture consists of three pairs of convolution layers where each convolution layer is followed by a max-pooling layer, and two dense fully-connected layers. The final output is passed to a softmax layer to determine the network predictions for three classes. Rectified Linear Unit (ReLU) non-linearities were used in the network, and a dropout layer (with a probability of $0.5$) was added during training to reduce overfitting.
The training images were mean-centered using the mean from all the training images. The training was performed with a batch size of 64 and a learning rate of 1e-3 for 150 epochs.
Two forms of data augmentation were used.
Classic Data Augmentation: Each input lesion ROI was rotated $N_{rot}$ times at random angles between $0^o$ and $180^o$, followed by a flip $N_{flip}$ times, a translation $N_{trans}$ times, and a scaling $N_{scale}$ times. Therefore, the total number of augmentations was $$ N = N_{rot} \times (1 + N_{flip} + N_{trans} + N_{scale}) $$
Augmentation using GANs: Deep Convolutional GAN (DCGAN) and Auxiliary Classifier GAN (ACGAN) were used to augment training data by generating skin lesion images. The authors note that DCGAN performs better than ACGAN for augmentation tasks, although they test both of them.
The training of adversarial networks, including GANs, is performed by optimizing the loss function of a two player minimax game, where the output of the Generator (G) and the Discriminator (D) undergo adversarial training.
$$\text{min}_G \ \text{max}_D \ \mathbb{E}_{x \sim p_{data}} \ \log D(x) + \mathbb{E}_{x \sim p_{z}} [\log(1 - D(G(z)))]$$
The authors report a massive improvement in performance by using GAN based augmentation versus classical image transformation (rotation, translation, flipping, and scaling) based augmentation. More specifically, while the CNN based classification yielded 78.6% sensitivity and 88.4% specificity for classical image transformation based augmentation approaches, the GAN based augmentation method reported 85.7% sensitivity and 92.4% specificity.
This summary was written in Fall 2018 as a part of the CMPT 880 Special Topics in AI: Medical Imaging Meets Machine Learning course.