This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Send Deep Learning Batch Job To Cluster

This example shows how to send deep learning training batch jobs to a cluster so that you can continue working or close MATLAB during training. Training deep neural networks often takes a lot of time. This example shows how to train neural networks as batch jobs and fetch the results from the cluster when they are ready. You can programmatically wait for the jobs to be completed or close MATLAB and later use the Job Monitor to obtain the results. This example sends the parallel parameter sweep in Use parfor to Train Multiple Deep Learning Networks as a batch job. After the job is completed, you can fetch the trained networks and compare their accuracies.

Submit the Batch Job

Use the batch function to send a script as a batch job to the cluster. The cluster allocates a worker to execute the contents of your script. If the parallel code in the script benefits from extra workers, for example, you use built-in parallel support or a parfor loop, you need to request them explicitly to batch. batch uses one worker for the client running the script and you can specify more by using the Pool name-value pair argument.

totalNumberOfWorkers = 5;
job1 = batch('trainMultipleNetworks', ...
    'Pool',totalNumberOfWorkers-1);

You can check the Job Monitor in Monitor Jobs under the Parallel menu to see the current status of your job in the cluster.

You can submit additional jobs to the cluster. If the cluster is running other jobs, the job will be in state queued until the cluster becomes available.

Fetch Results Programmatically

After submitting your jobs, you can wait for a particular job to finish with the wait command.

wait(job1);

After the job finishes, use the load function to fetch the results.

load(job1,'accuracies');
accuracies
load(job1,'trainedNetworks');
trainedNetworks
accuracies =

    0.8312
    0.8276
    0.8288
    0.8258


trainedNetworks =

  4×1 cell array

    {1×1 SeriesNetwork}
    {1×1 SeriesNetwork}
    {1×1 SeriesNetwork}
    {1×1 SeriesNetwork}

If you want to load all the variables in the batch job, use the load function without arguments.

load(job1);

If you decide to close MATLAB, you can still recover the job in the cluster and fetch the results. You can do this while the computation is still taking place in the cluster or when the computation is finished. Make a note of the ID of the job and use the findJob function to retrieve the job.

c = parcluster('local');
job = findJob(c,'ID',1);

Delete the job when you are done with it. The job will disappear from the Job Monitor.

delete(job1);

Use the Job Monitor to Fetch the Results

After submitting your batch jobs, all the computations happen in the cluster and you can safely close MATLAB. Use the Job Monitor in any other MATLAB session to check the status of your jobs. When a job is done, you can retrieve the results using the contextual menu that appears when you right-click the job. To load the job into the workspace, use "Show details". To load all variables in the job, use "Load variables". To delete the job when you are done, use "Delete".

See Also

Related Examples

More About