CNNs for Brain Tumor Segmentation in MRI Scans (BraTS)

Drawing inspiration from the popular VGG networks, the paper proposes using a deep convolutional neural network architecture with small convolutional kernels for segmentation of gliomas in MRI images. The authors discuss the relative advantages of using small kernels, and also explore the use of intensity normalization as a pre-processing step, which was unconventional in CNN-based segmentation methods. The proposed algorithm obtained the first position for the complete, the core, and the enhancing regions in Dice Similarity Coefficient metric in the Brain Tumor Segmentation Challenge 2013 database (BraTS 2013).

Instead of using the N4ITK method to correct for the bias field distortion that MR images suffer from, the authors use an intensity normalization method proposed by Ny'ul et al., where a set of intensity landmarks $I_L = \{pc_1, i_{p10}, i_{p20}, \cdots, i_{p90}, pc_2\}$ are learned from each training set sequence. After the training, intensity normalization is done by performing a linear transformation from the original intensities between two landmarks to the corresponding learned landmarks, thereby ensuring that the histogram of each sequence is more similar across subjects. After performing normalization on the MR images, the patches in each sequence are normalized to have zero mean and unit variance.

Two similar network architectures (but with different number of layers) were used for each tumor grade - Low Grade Glioma (LGG) and High Grade Glioma (HGG). The architecture used for HGG was deeper than the one used for LGG. A deeper network would have been particularly disadvantageous for LGG because given the smaller size of the training data set for LGG, the model would have overfit to the training data. The HGG model has 2.1 million parameters, while the LGG model had 1.9 million parameters.

Smaller kernels were chosen because it is possible to stack more convolutional layers while having the receptive field of bigger kernels and keeping the number of parameters lower. Both the networks used a $3 \times 3$ kernel size with a $1 \times 1$ stride for convolution operation, and a $3 \times 3$ kernel size with a $2 \times 2$ stride for the max pooling operation. Xavier initialization was used because it helps faster convergence and helps avoid exploding or vanishing gradients. While Rectified Linear Units (ReLU) non-linearities achieve better results than classical sigmoid or hyperbolic tangent, they can `die’ during training if their gradient ever becomes zero. A better solution is to use Leaky ReLUs which overcome this problem by introducing a small slope on the negative part of the function. The Dropout probability for the fully connected layers was higher for the LGG network ($p=0.5$ for LGG and $p=0.1$ for HGG), because the training set of LGG is smaller.

For the data augmentation, only rotation based operations were used since the patch label is obtained using the central voxel, and other transformations can result is assigning the wrong label to the patch. Angle rotations with multiples of $90^\text{o}$ ($90^\text{o}$, $180^\text{o}$, and $270^\text{o}$) were used to augment the dataset. Moreover, the authors also experimented with rotations of step size $\left(\frac{90}{16}\right)^\text{o}$ and observed that it gave the algorithm a performance improvement.

Stochastic gradient descent was used as the optimization algorithm along with Nesterov’s Accelerated Momentum. Categorical cross entropy was used as the loss function and softmax function was applied to the output of the last layer of the network.

Since the BraTS 2014 database was not available to the authors, they tested the method on the 2013 and the 2015 datasets. For each patient, four MRI sequences (T1-weighted, T1 with gadolinium enhancing contrast, T2-weighted, and FLAIR) were available. The sequences were already aligned with T1c and were skull-stripped. The evaluation of the segmentation was done on three metrics: Dice Similarity Coefficient (DSC), Poisitve Predictive Value (PPV), and Sensitivity.

This summary was written in Fall 2018 as a part of the CMPT 880 Special Topics in AI: Medical Imaging Meets Machine Learning course.