Use parfor to Train Multiple Deep Learning Networks

This example uses:

This example shows how to use a parfor loop to perform a parameter sweep on a training option.

Deep learning training often takes hours or days, and searching for good training options can be difficult. With parallel computing, you can speed up and automate your search for good models. If you have access to a machine with multiple graphical processing units (GPUs), you can complete this example on a local copy of the data set with a local parpool. If you want to use more resources, you can scale up deep learning training to the cloud. This example shows how to use a parfor loop to perform a parameter sweep on the training option MiniBatchSize in a cluster in the cloud. You can modify the script to do a parameter sweep on any other training option. Also, this example shows how to obtain feedback from the workers during computation using DataQueue. You can also send the script as a batch job to the cluster, so you can continue working or close MATLAB and fetch the results later. For more information, see Send Deep Learning Batch Job to Cluster.

Requirements

Before you can run this example, you need to configure a cluster and upload your data to the cloud. In MATLAB, you can create clusters in the cloud directly from the MATLAB Desktop. On the Home tab, in the Parallel menu, select Create and Manage Clusters. In the Cluster Profile Manager, click Create Cloud Cluster. Alternatively, you can use MathWorks Cloud Center to create and access compute clusters. For more information, see Getting Started with Cloud Center. For this example, ensure that your cluster is set as default on the MATLAB Home tab, in Parallel > Select a Default Cluster. After that, upload your data to an Amazon S3 bucket and use it directly from MATLAB. This example uses a copy of the CIFAR-10 data set that is already stored in Amazon S3. For instructions, see Work with Deep Learning Data in AWS.

Load the Data Set from the Cloud

Load the training and test data sets from the cloud using imageDatastore. Split the training data set into training and validation sets, and keep the test data set to test the best network from the parameter sweep. View the class names of the training data. In this example you use a copy of the CIFAR-10 data set stored in Amazon S3. To ensure that the workers have access to the datastore in the cloud, make sure that the environment variables for the AWS credentials are set correctly. See Work with Deep Learning Data in AWS.

imds = imageDatastore("s3://cifar10cloud/cifar10/train", ...
     IncludeSubfolders=true, ...
     LabelSource="foldernames");
 
imdsTest = imageDatastore("s3://cifar10cloud/cifar10/test", ...
    IncludeSubfolders=true, ...
    LabelSource="foldernames");

[imdsTrain,imdsValidation] = splitEachLabel(imds,0.9);
classNames = categories(imdsTrain.Labels)

classNames = 10×1 cell
    {'airplane'  }
    {'automobile'}
    {'bird'      }
    {'cat'       }
    {'deer'      }
    {'dog'       }
    {'frog'      }
    {'horse'     }
    {'ship'      }
    {'truck'     }

Train the network with augmented image data, by creating an augmentedImageDatastore object. Use random translations and horizontal reflections. Data augmentation helps prevent the network from overfitting and memorizing the exact details of the training images.

imageSize = [32 32 3];
pixelRange = [-4 4];
imageAugmenter = imageDataAugmenter( ...
    RandXReflection=true, ...
    RandXTranslation=pixelRange, ...
    RandYTranslation=pixelRange);
augmentedImdsTrain = augmentedImageDatastore(imageSize,imdsTrain, ...
    DataAugmentation=imageAugmenter, ...
    OutputSizeMode="randcrop");

Define Network Architecture

Define a network architecture for the CIFAR-10 data set. To simplify the code, use convolutional blocks that convolve the input. The pooling layers downsample the spatial dimensions.

imageSize = [32 32 3];
netDepth = 2; % netDepth controls the depth of a convolutional block
netWidth = 16; % netWidth controls the number of filters in a convolutional block

layers = [
    imageInputLayer(imageSize)
    
    convolutionalBlock(netWidth,netDepth)
    maxPooling2dLayer(2,Stride=2)
    convolutionalBlock(2*netWidth,netDepth)
    maxPooling2dLayer(2,Stride=2)
    convolutionalBlock(4*netWidth,netDepth)
    averagePooling2dLayer(8)
    
    fullyConnectedLayer(10)
    softmaxLayer
    ];

Train Several Networks Simultaneously

Specify the mini-batch sizes on which to do a parameter sweep. Allocate variables for the resulting networks and accuracy.

miniBatchSizes = [64 128 256 512];
numMiniBatchSizes = numel(miniBatchSizes);
trainedNetworks = cell(numMiniBatchSizes,1);
accuracies = zeros(numMiniBatchSizes,1);

Perform a parallel parameter sweep training several networks inside a parfor loop and varying the mini-batch size. The workers in the cluster train the networks simultaneously and send the trained networks and accuracies back when the training is complete. If you want to check that the training is working, set Verbose to true in the training options. Note that the workers compute independently, so the command line output is not in the same sequential order as the iterations.

parfor idx = 1:numMiniBatchSizes
    
    miniBatchSize = miniBatchSizes(idx);
    initialLearnRate = 1e-1 * miniBatchSize/256; % Scale the learning rate according to the mini-batch size.
    
    % Define the training options. Set the mini-batch size.
    options = trainingOptions("sgdm", ...
        MiniBatchSize=miniBatchSize, ... % Set the corresponding MiniBatchSize in the sweep.
        Verbose=false, ... % Do not send command line output.
        InitialLearnRate=initialLearnRate, ... % Set the scaled learning rate.
        Metrics="accuracy", ...
        L2Regularization=1e-10, ...
        MaxEpochs=30, ...
        Shuffle="every-epoch", ...
        ValidationData=imdsValidation, ...
        LearnRateSchedule="piecewise", ...
        LearnRateDropFactor=0.1, ...
        LearnRateDropPeriod=25);
    
    % Train the network in a worker in the cluster.
    net = trainnet(augmentedImdsTrain,layers,"crossentropy",options);
    
    % To obtain the accuracy of this network, use the trained network to
    % classify the validation images on the worker and compare the predicted labels to the
    % actual labels.
    scores = minibatchpredict(net,imdsValidation);
    Y = scores2label(scores,classNames);
    accuracies(idx) = sum(Y == imdsValidation.Labels)/numel(imdsValidation.Labels);
    
    % Send the trained network back to the client.
    trainedNetworks{idx} = net;
end

Starting parallel pool (parpool) using the 'MyClusterInTheCloud' profile ...
Connected to parallel pool with 4 workers (PreferredPoolNumWorkers).

After parfor finishes, trainedNetworks contains the resulting networks trained by the workers. Display the trained networks and their accuracies.

trainedNetworks

trainedNetworks=4×1 cell array
    {1×1 dlnetwork}
    {1×1 dlnetwork}
    {1×1 dlnetwork}
    {1×1 dlnetwork}

accuracies

accuracies = 4×1

    0.8404
    0.8378
    0.8374
    0.8346

Select the best network in terms of accuracy. Test its performance against the test data set.

[~, I] = max(accuracies);
bestNetwork = trainedNetworks{I(1)};

scores = minibatchpredict(bestNetwork,imdsTest);
Y = scores2label(scores,classNames);
accuracy = sum(Y == imdsTest.Labels)/numel(imdsTest.Labels)

accuracy = 0.8374

Send Feedback Data During Training

Prepare and initialize plots that show the training progress in each of the workers. Use animatedLine for a convenient way to show changing data.

f = figure;
f.Visible = true;
for i=1:4
    subplot(2,2,i)
    xlabel('Iteration');
    ylabel('Training accuracy');
    lines(i) = animatedline;
end

Send the training progress data from the workers to the client by using DataQueue, and then plot the data. Update the plots each time the workers send training progress feedback by using afterEach. The parameter opts contains information about the worker, training iteration, and training accuracy.

D = parallel.pool.DataQueue;
afterEach(D, @(opts) updatePlot(lines, opts{:}));

Perform a parallel parameter sweep training several networks inside a parfor loop with different mini-batch sizes. Note the use of OutputFcn in the training options to send the training progress to the client each iteration.

parfor idx = 1:numMiniBatchSizes
    
    miniBatchSize = miniBatchSizes(idx);
    initialLearnRate = 1e-1 * miniBatchSize/256; % Scale the learning rate according to the miniBatchSize.
    
    % Define the training options. Set an output function to send data back
    % to the client each iteration.
    options = trainingOptions("sgdm", ...
        MiniBatchSize=miniBatchSize, ... % Set the corresponding MiniBatchSize in the sweep.
        Verbose=false, ... % Do not send command line output.
        InitialLearnRate=initialLearnRate, ... % Set the scaled learning rate.
        OutputFcn=@(state) sendTrainingProgress(D,idx,state), ... % Set an output function to send intermediate results to the client.
        Metrics="accuracy", ...
        L2Regularization=1e-10, ...
        MaxEpochs=30, ...
        Shuffle="every-epoch", ...
        ValidationData=imdsValidation, ...
        LearnRateSchedule="piecewise", ...
        LearnRateDropFactor=0.1, ...
        LearnRateDropPeriod=25);
    
    % Train the network in a worker in the cluster. The workers send
    % training progress information during training to the client.
    net = trainnet(augmentedImdsTrain,layers,"crossentropy",options);
    
    % To obtain the accuracy of this network, use the trained network to
    % classify the validation images on the worker and compare the predicted labels to the
    % actual labels.
    scores = minibatchpredict(net,imdsValidation);
    Y = scores2label(scores,classNames);
    accuracies(idx) = sum(Y == imdsValidation.Labels)/numel(imdsValidation.Labels);
    
    % Send the trained network back to the client.
    trainedNetworks{idx} = net;
end

After parfor finishes, trainedNetworks contains the resulting networks trained by the workers. Display the trained networks and their accuracies.

trainedNetworks

trainedNetworks=4×1 cell array
    {1×1 dlnetwork}
    {1×1 dlnetwork}
    {1×1 dlnetwork}
    {1×1 dlnetwork}

accuracies

accuracies = 4×1

    0.8388
    0.8288
    0.8326
    0.8226

Select the best network in terms of accuracy. Test its performance against the test data set.

[~, I] = max(accuracies);
bestNetwork = trainedNetworks{I(1)};

scores = minibatchpredict(bestNetwork,imdsTest);
Y = scores2label(scores,classNames);
accuracy = sum(Y == imdsTest.Labels)/numel(imdsTest.Labels)

accuracy = 0.8352

Helper Functions

Define a function to create a convolutional block in the network architecture.

function layers = convolutionalBlock(numFilters,numConvLayers)
layers = [
    convolution2dLayer(3,numFilters,Padding="same")
    batchNormalizationLayer
    reluLayer
    ];

layers = repmat(layers,numConvLayers,1);
end

Define a function to send the training progress to the client through DataQueue.

function stop = sendTrainingProgress(D,idx,info)
if info.State == "iteration" && ~isempty(info.TrainingAccuracy)
    send(D,{idx,info.Iteration,info.TrainingAccuracy});
end
stop = false;
end

Define an update function to update the plots when a worker sends an intermediate result.

function updatePlot(lines,idx,iter,acc)
addpoints(lines(idx),iter,acc);
drawnow limitrate nocallbacks
end

Related Examples

More About

Parallel for-Loops (parfor) (Parallel Computing Toolbox)