Resolve Scaling Issues in the Cloud

Issue

When MATLAB^® Parallel Server™ is unable to create new workers in a cluster running in the cloud, scaling errors are generated.

When cluster users try to run a job on the cluster, the job fails with an error like the following:

Error using parallel.Job/submit (line 304)
An error occurred during execution of Task with ID 1.
Caused by:
    Job was cancelled because the cluster does not have enough workers to 
    meet the minimum of the job's NumWorkersRange property: <num_workers>.

Possible Solutions

Depending on whether your MATLAB Parallel Server cluster is running on Amazon^® Web Services (AWS^®) or Microsoft^® Azure^®, try one of these solutions.

Troubleshoot Scaling Issues for Clusters on AWS

To troubleshoot issues for a cluster on Amazon Web Services (AWS), see the AWS documentation on Troubleshoot issues in Amazon EC2 Auto Scaling.

For quota related issues, see the AWS documentation on Quotas for Auto Scaling resources and groups.

If your issue is related to a missing base image, see the following section.

AWS Base Image Not Available

If your MATLAB Parallel Server cluster uses the base Amazon machine image (AMI) provided by MathWorks^®, your cluster cannot create new workers after the base image is replaced. MathWorks periodically updates the base image to include the latest security patches. This issue does not affect clusters that run for less than a month.

To run a job that requires more workers, you can deploy a new cluster, which uses the latest AMI. Alternatively, you can avoid this issue by copying the AMI for a certain MATLAB version to your own AWS account and then creating a cluster based on this AMI.

Note

Saving an AMI to your account incurs costs. To save costs, delete the AMI and the snapshots when you no longer need them.

These steps show you how you can copy the AMI for a certain MATLAB version to your AWS account.

In the Releases folder of the MATLAB Parallel Server on AWS repository on GitHub^®, choose the release that you want to copy.
Navigate to the "Deploy Cluster in a Custom Region" section in the README file.
Click the AWS quick-create link in this section to open a CloudFormation template with prepopulated fields.
Set the AWS region in the AWS console to your desired region.
Deploy the template to copy the AMI. Copying takes 5 to 15 minutes.

After your AMI is ready, use the LaunchClusterWithCopiedAmi link in the outputs tab to deploy a cluster in your desired region. You can also share this link or the Custom AMI ID with others in your AWS account to allow them to deploy clusters using the same AMI.

Troubleshoot Scaling Issues for Clusters on Azure

To diagnose deployment errors for a cluster on Azure, you must retrieve the error code from the activity log and troubleshoot it using these steps.

Sign in to the Azure Portal.
Navigate to Monitor under Azure Services.
On the Monitor page, select Activity Log in the left pane.
Select the subscription for your cluster.
Add a Resource Group filter to choose the resource group that contains your cluster, and a Resource filter to select your cluster itself.
Set the time range and event severity filters to narrow down the events.
After applying filters, click an event to view its error code and message for details about the issue.
For descriptions of common error codes and guidance on resolving issues, see the Azure documentation on Azure Virtual Machine Scale Sets Troubleshooting.

For more details about activity log, see the Azure documentation on Activity Log in Azure Monitor.