Posts

ImageNet: A Large-Scale Hierarchical Image Database

The availability of large volumes of data is a key requirement in the development of efficient, robust, and advanced machine learning based prediction models. This paper introduces the ImageNet database, “a large scale ontology of images” built upon the hierarchical structure of WordNet, an online lexical database of meaningful concepts. These concepts, described by words or word phrases, are known as synonym sets or synsets. The ImageNet dataset contains 3.2 million labeled images, organized in 12 subtrees and 5247 synsets in total, with an average of 600 images per synset, making it one of the largest publicly available image datasets in terms of the number and the diversity of images, the accuracy of the image labels, and the hierarchical structure of the dataset. ...

Multi-Scale Context Aggregation By Dilated Convolutions

The semantic segmentation task in computer vision involves partitioning an image into a set of multiple non-overlapping and semantically interpretable regions. This entails assigning pixel-wise class labels to the entire image, making it a dense prediction task. Owing to the massive improvements in image classification performance achieved by CNNs over the recent years, there have been several works which successfully repurpose these popular image classification CNN architectures for dense prediction tasks. This paper questions this approach, and instead investigates if modules specifically designed for a dense prediction task would improve the segmentation performance even further. Unlike image classification networks which aggregate multi-scale contextual information through successive downsampling operations to obtain a global prediction, a dense prediction task like semantic segmentation requires “multi-scale contextual reasoning in combination with full-resolution output”. % However, increasing the receptive field of the convolution operator comes at the cost of more parameters. The authors therefore propose using the dilated convolution operator to address this. To this end, this paper makes threefold contributions: (a) a generalized form of the convolution operator to account for dilation, (b) a multi-scale context aggregation module that relies on dilated convolution, and (c) a simiplified front-end module which gets rid of “vestigial components” carried over from image classification networks. ...

Augmenting Data by Learning Spatial and Appearance Transformations

Here are some slides I made to present this CVPR 2019 paper in our reading group:

Deep NNs for Segmentation

GAN-based Synthetic Medical Image Augmentation

The paper proposes using Generative Adversarial Networks (GANs) to augment the dataset with high quality synthetic liver lesion images in order to improve the CNN classification performance for medical image classification. The authors use limited dataset of computed tomography (CT) images of 182 liver lesions (53 cysts, 64 metastases and 65 hemangiomas). The liver lesions vary considerably in shape, contrast and size, and also present intra-class variability. ...