AWS Machine Learning Blog

Run image classification with Amazon SageMaker JumpStart

Last year, AWS announced the general availability of Amazon SageMaker JumpStart, a capability of Amazon SageMaker that helps you quickly and easily get started with machine learning (ML).

JumpStart hosts 196 computer vision models, 64 natural language processing (NLP) models, 18 pre-built end-to-end solutions, and 19 example notebooks to help you get started with using SageMaker. These models can be quickly deployed and are pre-trained open-source models from PyTorch Hub and TensorFlow Hub. These models solve common ML tasks such as image classification, object detection, text classification, sentence pair classification, and question answering. The example notebooks show you how to use the 17 SageMaker built-in algorithms and other features of SageMaker. Additionally, 28 searchable blogs and 28 videos have been provided within JumpStart. This content will be searchable in order to provide an easy and quick way to get started with SageMaker.

This is part 1 of a series of posts, and covers image classification tasks. With a step-by-step walkthrough, we show how to use a pre-trained image classification model for classifying images of various objects. Furthermore, if you want to classify new object types that the model wasn’t pre-trained on, we show how to fine-tune the model on your own dataset and use the fine-tuned model to classify the images of those new object types.

JumpStart overview

To help you get started quickly with ML on SageMaker, JumpStart enables you to do the following:

  • Deploy ready-to-go, pre-trained models – In the world of ML, a lot of research and efforts have been devoted to building and training state-of-the-art models to solve a set of common tasks, including image classification, image feature generation, object detection, text classification, sentence pair classification, text generation, and question answering. Development in recent years has allowed machines to perform these tasks at a reasonable level of competency that is consumable by people. The open-source community has made state-of-the-art, pre-trained models publicly available to use. JumpStart hosts a collection of these models from TensorFlow Hub, PyTorch Hub, and Hugging Face, and makes them easy for beginners to use through a graphical user interface.
  • Fine-tune pre-trained models – In ML, the ability to transfer the learning from one domain to another domain is called transfer learning. You can use transfer learning to produce accurate models with your smaller datasets and with less training time than is required to train large datasets that seed the initial model. The process of fine-tuning trains a part of a neural network model, which has often already been pre-trained on a very large corpus of data on a new dataset. This allows you to customize the model to your specific use case or dataset without the need to train from scratch, which provides cost and time savings, and the opportunity to reuse the pre-trained model that was trained on the initial large corpus of data. JumpStart allows you to fine-tune these pre-trained models on your datasets to get the best predictive performance.
  • Use pre-built solutions – JumpStart provides a set of 18 solutions for the most common use cases, which you can deploy with just a few clicks. The solutions are fully customizable and showcase the use of AWS CloudFormation templates and reference architectures so you can accelerate your ML journey. These solutions cover areas from demand forecasting to churn prediction, from industrial to financial applications.
  • Use examples from SageMaker algorithms – SageMaker provides a suite of built-in algorithms to help data scientists and ML practitioners get started on training and deploying ML models quickly. You can use these algorithms for supervised learning such as classification or regression tasks, and unsupervised learning such as clustering, pattern recognition, and anomaly detection tasks.
  • Obtain training for ML using videos and blogs – We provide numerous blog posts and videos for quick and easy training.

Image classification in JumpStart

JumpStart offers more than 80 state-of-the-art pre-trained image classification models from TensorFlow Hub and PyTorch Hub. This post is split into two parts:

  • Deploy a pre-trained image classification model for running inference – A deployed model is used for classifying images of various objects types that were included in the dataset that the model was pre-trained on. We consider a use case in which we want to classify images of a cat and a dog. We can deploy our pre-trained model in less than 5 minutes, and invoke the deployed model any time to run inference on any image. Most of the JumpStart image classification models are pre-trained on ImageNet (ILSVRC-2012-CLS), which comprises images of 1,000 different classes. A list of all the class labels is available at ImageNetLabels. You can deploy these models as is to classify objects belonging to the ImageNetLabels.
  • Fine-tune a pre-trained image classification model on your own dataset – Fine-tuning is needed when you want to classify new object types that the model wasn’t pre-trained on. We consider a use case in which you want to correctly classify images of flowers, using a model pre-trained on a dataset that didn’t include those flowers. We can fine-tune the pre-trained model on a dataset consisting of five types of flowers in less than 10 minutes, and deploy the fine-tuned model to run inference image on any image. We show that the fine-tuned model can correctly classify images of these new flowers.

We show both these use cases via a step-by-step walkthrough that requires no coding.

Deploy a pre-trained image classification model for running inference

In this section, we locate a desired pre-trained model in JumpStart and show how to run inference on the deployed endpoint.

Let’s start from the Amazon SageMaker Studio Launcher.

  1. On the Studio Launcher, choose Browse JumpStart.

The JumpStart landing page has sections for carousels for solutions, text models, and vision models. It also has a search bar.

The search bar at the top allows you to search for contents in JumpStart that match your search term.

  1. In the search bar, enter MobileNet V2.

The MobileNet V2 search gives the following results. The first model card is for MobileNet V2 image classification model from TensorFlow Hub, and the third card is for the same model from the PyTorch Hub. The second MobileNet V2 card is for extracting features of an input image. The other MobileNet V2 cards are the different (smaller) versions of the original MobileNet V2 model.

  1. Choose the MobileNet V2 card for image classification from TensorFlow Hub.

You’re redirected to the landing page for the MobileNet V2 model. You can choose to deploy the existing model right away by choosing Deploy, or scroll down and choose Fine-tune to fine-tune the model on the default dataset or your own dataset.

Further scrolling down the page gives the model description. This page has a weblink of the MobileNet V2 model on the TensorFlow Hub. It also gives details of the dataset on which the model was pre-trained, and the labels of the object types in that dataset. In particular, MobileNet V2 is trained on ImageNet (ILSVRC-2012-CLS), which comprises images of 1,000 different classes, including the additional class for background. A list of all the class labels is available at ImageNetLabels.

The page also explains how to use the deployed model for running inference. It shows what ML task you can solve by deploying the model. In particular, it shows two example input images and model predictions on them. We see that the model correctly classifies a cat image as a cat, and the top-5 model predictions include an Egyptian cat in the first position. The model also correctly classifies a dog image as a dog in its top-5 predictions.

  1. Choose Deploy to deploy a SageMaker endpoint for your MobileNet V2 model.

The status first shows as “Preparing your Model,” which usually takes a few minutes depending on the size of the model. The endpoint is given a name that indicates which model it is.

After the model is prepared, an endpoint is launched on the instance type that you chose when deploying the model. In our case, we continued with the default endpoint. However, you can change the default endpoint before you deploy. Creating an endpoint may take 5–10 minutes.

When the deployed endpoint is ready for running inference, the page shows two buttons:

  • Open Notebook – Provides a piece of Python code that you can run as is to invoke the endpoint for running inference on a couple of example images
  • Delete – Deletes the endpoint and associated assets on SageMaker

  1. Choose Open Notebook to open a notebook that has a piece of Python code that you can run without any modification to run inference on the deployed endpoint for an example cat and dog image.

When the first cell of the notebook is run, it downloads an example cat and dog image from an Amazon Simple Storage Service (Amazon S3) bucket, and displays them. It also downloads a class ID to label mapping that is used to convert the model predictions into the labels.

When the Query endpoint that you have created cell is run, it invokes the endpoint with the cat and dog images one by one, maps the model predictions to the class labels, and displays the top-5 model predictions. The model outputs class logits for each of the classes. Logits, or log-odds, are a measurement of probability that the input belongs to a given class. The larger the logit, the larger the probability that the input image belongs to that class.

Fine-tune a model on your own dataset using a pre-trained image classification model

You can deploy a pre-trained image classification model as-is to run inference when you need to classify image types that were included in the dataset that the model was pre-trained on. As mentioned in the previous section, most of the JumpStart image classification models are pre-trained on ImageNet (ILSVRC-2012-CLS), which comprises images of 1,000 different classes of images (image types). A list of all the class labels is available at ImageNetLabels. These models can be deployed as is to classify objects belonging to the ImageNetLabels.

However, if you need to classify objects that don’t belong to the original dataset that the model was pre-trained on, then you must fine-tune the model on the new dataset that comprises the image types that you want the model to classify. Without fine-tuning, the pre-trained model doesn’t classify the new image types. For example, if you need to classify images of flowers such as roses and sunflowers, and run inference on a deployed pre-trained MobileNet model, the model prediction doesn’t recognize any characteristics of the new image types and therefore the classification is incorrect. This is because MobileNet is pre-trained on ImageNet data, which doesn’t include roses and sunflowers. We demonstrate this in the following screenshot by running inference on the pre-trained MobileNet model deployed as is, explained in the previous section.

The deployed pre-trained model classifies both new image types of roses and sunflowers incorrectly.

Consequently, if you need to classify objects that weren’t included in the dataset the model was pre-trained on, you need to first fine-tune the model on a dataset that includes the new images and image labels that you want the model to learn and classify. After you fine-tune the model on the new dataset, you deploy the fine-tuned model as explained in the previous section, and can now run inference on an input set of images that include the new flowers you added to your augmented dataset.

Transfer learning

Fine-tuning a pre-trained model on a new dataset is a standard ML technique that relies on the concept of transfer learning. Transfer learning allows us to reuse and leverage the model’s learning on the pre-trained dataset to merely fine-tune the model on the new objects, rather than retrain it in its entirety.

The main advantage of fine-tuning and training on a new dataset starting from the pre-trained model as opposed to training from scratch is that it requires much less data to fine-tune a pre-trained model. During fine-tuning, except for a very small percentage of parameters that represent the classification layer, most model parameters are frozen, meaning the fine-tuning algorithm disallows the weights of those parameters to be affected by training backpropagation.

If you only include updates to the classification layer, you can efficiently train on a much smaller dataset without overfitting. For example, the ImageNet (ILSVRC-2012-CLS) dataset, which is the pre-training dataset of most of the image classification models, comprises approximately 1.2 million images across 1,000 different object types. The flowers dataset that we fine-tune the model on in the following demonstration comprises only 3,670 images across five types of flowers.

Fine-tuning for new flower types

In this section, we demonstrate how easy it is to fine-tune a pre-trained model to classify new flowers, using JumpStart.

We take the MobileNet model trained on ImageNet and fine-tune it on a flowers dataset. The flowers dataset comprises five different types of flowers that don’t exist in the ImageNet dataset. Therefore, we need to fine-tune the pre-trained model to the flowers dataset in order to predict accurately the type of flower. For more information about preprocessing images, see Load and preprocess images.

The model page of MobileNet V2, as explained in the previous section, gives a description of how the pre-trained model is fine-tuned.

You can fine-tune the model to any given dataset comprising images belonging to any number of classes.

The model available for fine-tuning attaches a classification layer to the corresponding feature extractor model available on TensorFlow, and initializes the layer parameters to random values. The output dimension of the classification layer is determined by the number of classes in the input data. The fine-tuning step adjusts the classification layer parameters while keeping the parameters of the feature extractor model frozen, and returns the fine-tuned model. The objective is to minimize prediction error on the input data. You can then deploy the model returned by fine-tuning for inference. The following are the instructions for how the training data should be formatted for input into the model.

The input directory is a directory with as many sub-directories as the number of classes. Each sub-directory should have images belonging to that class in .jpg format. Our input directory looks like the following structure. The training data contains images from two classes: roses and dandelion. The S3 path should look like s3://bucket_name/input_directory/. Note the trailing / is required. The names of the folders and roses, dandelion, and the .jpg file names can be anything.

The output is a trained model that you can deploy for inference. The label mapping file that is saved along with the trained model on the S3 bucket maps the folder names roses and dandelion to the indexes in the list of class probabilities the model outputs. The mapping follows alphabetical ordering of the folder names. In the following example, index 0 in the model output list corresponds to dandelion, and index 1 corresponds to roses.

flowers_photos/
  |--daisy/
    |--daisy1.jpg
    |--daisy2.jpg
  |--dandelion/
  |--roses/
  |--sunflowers/
  |--tulips/

You can fine-tune the pre-trained MobileNet model by choosing Train on the default dataset tf_flowers, on the default SageMaker training instance with the default training hyperparameters. Optionally, you can fine-tune the model on your own dataset hosted in an S3 bucket. Also, you can change the training instance, name of the trained model, and the training hyperparameters.

The image classification model provides three training hyperparameters that you can set according to your fine-tuning dataset: epochs, learning rate, and batch size.

When you choose Train, JumpStart launches a training job on SageMaker, which fine-tunes the pre-trained MobileNet model on the dataset you provided on the selected SageMaker instance, with the chosen training hyperparameters. The page shows the training status.

When the training is complete, the page updates the training status to “Complete” and the Deploy button appears, which allows you to deploy the fine-tuned model for running inference, as we explained in the previous section for the pre-trained model.

Deploying the trained model opens a new tab that shows the endpoint status. When the endpoint is ready for running inference, you can either choose Open Notebook to open the notebook or Delete to delete the endpoint.

Opening the notebook opens the same notebook that you opened when deploying the pre-trained model. You need to upload your own images of flowers to run inference on them. Also, you should create a class_id_to_label list in order to correctly map the model predicted class indexes to labels. As explained earlier, the model labels the object types alphabetically using the class names as the folder names in the training data. With these modifications, you can use the notebook to run inference on your own images. In the following screenshot, the pre-trained model correctly classifies roses.jpeg and sunflowers.jpeg by returning the respective flower labels as the first prediction in the top-5 predictions.

Conclusion

SageMaker JumpStart is a capability in SageMaker that allows you to quickly get started with ML. JumpStart uses open-source pre-trained models to solve common ML problems like image classification, object detection, text classification, sentence pair classification, and question answering.

In this post, we showed you how to deploy a pre-trained image classification model for running inference. We then showed you how to fine-tune a pre-trained image classification model on your own dataset. With JumpStart, you can complete both of these image classification tasks code-free. Try out the solution on your own and let us know how it goes in the comments. To learn more about JumpStart, see SageMaker JumpStart or check out this AWS re:Invent 2020 video on JumpStart.


About the Authors

Dr. Ali Arsanjani is TechSector Leader for AI/ML Specialist Solution Architecture and a Principal AI/ML Specialist Solutions Architect. Previously, Ali was an IBM Distinguished Engineer, CTO for Analytics and Machine Learning, and IBM Master Inventor. Ali is an adjunct faculty teaching the Masters of Science in Data Science program at San Jose State University, and advises master’s projects.

 

Dr. Li Zhang is a Principal Product Manager-Technical for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms, a service that helps data scientists and machine learning practitioners get started with training and deploying their models, and uses reinforcement learning with Amazon SageMaker. His past work as a principal research staff member and master inventor at IBM Research has won the test of time paper award at IEEE INFOCOM.

 

Dr. Ashish Khetan is an Applied Scientist with Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He is also an active researcher in machine learning and statistical inference and has published many papers in NeurIPS, ICML, ICLR, JMLR, and ACL conferences.