Train Network in the Cloud Using Automatic Parallel Support

This example uses:

This example shows how to train a convolutional neural network using MATLAB® automatic support for parallel training.

Deep learning training often takes hours or days. With parallel computing, you can speed up training using multiple graphical processing units (GPUs) locally or in a cluster in the cloud. If you have access to a machine with multiple GPUs, then you can complete this example on a local copy of the data. If you want to use more resources, then you can scale up deep learning training to the cloud. To learn more about your options for parallel training, see Scale Up Deep Learning in Parallel, on GPUs, and in the Cloud. This example guides you through the steps to train a deep learning network on a cluster in the cloud using MATLAB automatic parallel support.

Requirements

Before you can run the example, you need to configure a cluster and upload data to the cloud. In MATLAB, you can create clusters in the cloud directly from the MATLAB Desktop. On the Home tab, in the Environment area, select Parallel > Create and Manage Clusters. In the Cluster Profile Manager, click Create Cloud Cluster. Alternatively, you can use MathWorks® Cloud Center to create and access compute clusters. For more information, see Getting Started with Cloud Center. After that, upload your data to an Amazon® S3 bucket and access it directly from MATLAB. This example uses a copy of the CIFAR-10 data set that is already stored in Amazon S3. For instructions, see Work with Deep Learning Data in AWS.

Set Up Cluster

Select your cloud cluster and start a parallel pool, set the number of workers to the number of GPUs in your cluster. If you specify more workers than GPUs, then the remaining workers are idle.

numberOfGPUs = 4;
cluster = parcluster("MyClusterInTheCloud");
pool = parpool(cluster,numberOfGPUs);

Starting parallel pool (parpool) using the 'MyClusterInTheCloud' profile ...
Connected to parallel pool with 4 workers.

If you do not specify a cluster, the default cluster profile is used. Check the default cluster profile on the MATLAB Home tab, in the Environment area, select Parallel > Create and Manage Cluster.

Load Data Set from the Cloud

Load the training and test data sets from the cloud using imageDatastore. In this example, you use a copy of the CIFAR-10 data set stored in Amazon S3. To ensure that the workers have access to the datastore in the cloud, make sure that the environment variables for the AWS credentials are set correctly. See Work with Deep Learning Data in AWS.

imdsTrain = imageDatastore("s3://cifar10cloud/cifar10/train", ...
    IncludeSubfolders=true, ...
    LabelSource="foldernames");

imdsTest = imageDatastore("s3://cifar10cloud/cifar10/test", ...
    IncludeSubfolders=true, ...
    LabelSource="foldernames");

Calculate the number of classes and the number of images in each category. labelCount is a table that contains the labels and the number of images having each label. The training datastore contains 5000 images for each class, for a total of 50000 images. You can specify the number of classes in the last fully connected layer of your neural network as the OutputSize argument.

classes = categories(imdsTrain.Labels);
numClasses = numel(classes);
labelCount = countEachLabel(imdsTrain)

labelCount=10×2 table
      Label       Count
    __________    _____

    airplane      5000 
    automobile    5000 
    bird          5000 
    cat           5000 
    deer          5000 
    dog           5000 
    frog          5000 
    horse         5000 
    ship          5000 
    truck         5000

Train the network with augmented image data by creating an augmentedImageDatastore object. Use random translations and horizontal reflections. Data augmentation helps prevent the network from overfitting and memorizing the exact details of the training images.

imageSize = [32 32 3];
pixelRange = [-4 4];
imageAugmenter = imageDataAugmenter( ...
    RandXReflection=true, ...
    RandXTranslation=pixelRange, ...
    RandYTranslation=pixelRange);
augmentedImdsTrain = augmentedImageDatastore(imageSize,imdsTrain, ...
    DataAugmentation=imageAugmenter, ...
    OutputSizeMode="randcrop");

Define Network Architecture and Training Options

Define a network architecture for the CIFAR-10 data set. To simplify the code, use convolutional blocks that convolve the input. The supporting function convolutionalBlock is provided at the end of this example and creates repeating blocks of layers, each containing a convolution layer, a batch normalization layer, and a ReLU layer. The pooling layers downsample the spatial dimensions.

blockDepth = 4;
netWidth = 32;

layers = [
    imageInputLayer(imageSize) 
    
    convolutionalBlock(netWidth,blockDepth)
    maxPooling2dLayer(2,Stride=2)
    convolutionalBlock(2*netWidth,blockDepth)
    maxPooling2dLayer(2,Stride=2)    
    convolutionalBlock(4*netWidth,blockDepth)
    averagePooling2dLayer(8) 
    
    fullyConnectedLayer(numClasses)
    softmaxLayer
];

When you use multiple GPUs, you increase the available computational resources. Scale up the mini-batch size with the number of GPUs to keep the workload on each GPU constant and scale the learning rate according to the mini-batch size.

miniBatchSize = 256 * numberOfGPUs;
initialLearnRate = 1e-1 * miniBatchSize/256;

Specify the training options:

Train a network using the SGDM solver for 50 epochs.
Train the network in parallel using the current cluster by setting the execution environment to parallel-auto. If the pool has access to GPUs, then workers with a GPU perform training computation. Otherwise, training takes place on all available CPU workers. For information on supported GPU devices, see GPU Computing Requirements (Parallel Computing Toolbox).
Use a learning rate schedule to drop the learning rate as the training progresses.
Use L2 regularization to prevent overfitting.
Set the mini-batch size and shuffle the data every epoch.
Validate the network using the validation data.
Turn on the training progress plot to obtain visual feedback during training.
Track the accuracy of the network.
Disable the verbose output.

options = trainingOptions("sgdm", ...
    MaxEpochs=50, ...
    ExecutionEnvironment="parallel-auto", ...
    InitialLearnRate=initialLearnRate, ...
    LearnRateSchedule="piecewise", ...
    LearnRateDropFactor=0.1, ...
    LearnRateDropPeriod=45, ...
    L2Regularization=1e-10, ...
    MiniBatchSize=miniBatchSize, ...
    Shuffle="every-epoch", ...
    ValidationData=imdsTest, ...
    ValidationFrequency=floor(numel(imdsTrain.Files)/miniBatchSize), ...
    Plots="training-progress", ...
    Metrics="accuracy", ...
    Verbose=false);

Train Network and Use for Classification

Train the network in the cluster. During training, the plot displays the progress.

net = trainnet(augmentedImdsTrain,layers,"crossentropy",options);

Use the trained network to classify the test images on your local machine, then compare the predicted labels to the actual labels. To make predictions with multiple observations, use the minibatchpredict function. To convert the prediction scores to labels, use the scores2label function. The minibatchpredict function automatically uses a GPU if one is available.

scores = minibatchpredict(net,imdsTest);
YTest = scores2label(scores,classes);
accuracy = sum(YTest == imdsTest.Labels)/numel(imdsTest.Labels)

accuracy = 0.8938

Close the parallel pool if you do not intend to use it again.

delete(pool)

Parallel pool using the 'MyClusterInTheCloud' profile is shutting down.

Supporting Functions

Convolutional Block Function

The convolutionalBlock function creates numConvBlocks convolutional blocks, each containing a 2-D convolutional layer, a batch normalization layer, and a ReLU layer. Each 2-D convolutional layer has numFilters 3-by-3 filters.

function layers = convolutionalBlock(numFilters,numConvBlocks)
    layers = [
        convolution2dLayer(3,numFilters,Padding="same")
        batchNormalizationLayer
        reluLayer
    ];
    
    layers = repmat(layers,numConvBlocks,1);
end