Image Classification with Amazon Sagemaker and ResNet

Elisabeth Cawley
Mar 3, 2019
3 min read

During the 2015 Large Scale Visual Recognition Challenge [1] ResNet won first place. It's 152 layers were tested on the Image-Net[2] data set. According to www.Image-net.org;

"ImageNet is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet[3], possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, [they] aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated. In its completion, [they] hope ImageNet will offer tens of millions of cleanly sorted images for most of the concepts in the WordNet hierarchy... [The Large Scale Visual Recognition Challenge(LSVRC)] evaluates algorithms for object localization/detection and image/scene classification from images and videos at [a] large scale."

To create a unique image classifier Amazon Web Service's Sagemaker image classification algorithm relies on the ResNet neural net topology that can achieve high levels of accuracy in image classification. [4] How does the ResNet topology work? Samir Araújo is an AI Solutions Architect at AWS explains,

"Topologies like Resnet are called a convolutional neural network (CNN) because the network’s input layers execute convolution operations on the input image. A convolution is a mathematical function that emulates the visual cortex of an animal. A CNN has several convolution layers that learn image filters. These filters extract features from the input images such as edges, parts, and bodies. These features are then routed through the hidden or inner layers to the output layer. In the context of image classification, the output layer has one output per category.Consider a trained neural network capable of classifying a dog... The convolution layers extract some features from the dog image[pointy ears, tail, dog nose], and the rest of the layers route these features to the correct output of the last layer with a high confidence."[5]

Garrett and Nick collected some visual data this week of Garrett's brother playing

lacrosse and my next steps are to prepare the data. How does one prepare a data set? Samir Araújo elaborates,

"The Amazon SageMaker built-in Image Classification algorithm requires that the dataset be formatted in RecordIO. RecordIO is an efficient file format that feeds images to the NN as a stream. Since Fashion MNIST comes formatted in IDX, you need to extract the raw images to the file system. Then you convert the raw images to RecordIO, and finally you upload them to Amazon Simple Storage Service (Amazon S3).

To prepare the dataset:

1. Download the dataset.

2. Unpack the images ...to raw JPEG grayscale images of 28×28 pixels.

3. Organize the images into ...distinct directories, one per category.

4. Create two .lst files using a RecordIO tool (im2rec). One file is for the training portion of the dataset (70%). The other is for testing (30%).

5. Generate both .rec files from the .lst

6. Copy both .rec files to an Amazon S3 bucket."[6]

After setting up the environment it is time to train the model. Samir says, "Amazon SageMaker is a platform based on Docker containers. Every built-in algorithm is a Docker image prepared with all of the libraries, frameworks, and binaries along with the algorithm itself. It turns the platform as flexible as possible."[7] One of my next steps this week is to download Windows Enterprise or Pro which is required for a Docker Desktop download.[8]

[1]http://image-net.org/challenges/LSVRC/2015/results

[2]http://image-net.org/

[3]https://wordnet.princeton.edu/

[4,5,6,7]https://aws.amazon.com/blogs/machine-learning/classify-your-own-images-using-amazon-sagemaker/

[8]https://www.docker.com/products/docker-desktop