Parallel pool takes extremely long to startup on virtual machines
28 ビュー (過去 30 日間)
I have spent some time searching the forums about this question and have read many of the threads regarding the parallel pool runtime/startup issues. Where there were suggestions I tested them. Unfortunately we are still running into significant runtime issues.
The general description: starting up the parallel pool and executing a very simple parallel loop on our virtual machines takes an extremely long time. All virtual machines are accessed through Dell VDI thin-clients and are running windows.
In all cases I am using the parallel computing toolbox on my named-user license and not attempting to pull from a license server. I used MATLAB 2021b for these tests.
I ran multiple tests, and in each case I ran the following script:
nWorker = ...
a = zeros(100000,1);
parfor ii = 1 : size(a,1)
a(ii) = rand;
In some cases I set nWorker to total available cores. In other cases I limited it. Some benchmark results:
- Desktop Computer: Dell Precision 5820 | Xeon W-2135 | 6 (of 6) cores @ 3.7GHz | 32GB Memory | Win10. Runtime: 10 seconds
- Virtual Machine: Xeon Gold 6246 | 8 (of 8) cores @ 3.3GHz | 32GB Memory | Win10. Runtime: 65 seconds
- Virtual Machine: Xeon Platinum 8280 | 12 (of 28 ) cores @ 2.7GHz | 64GB Memory | Win10. Runtime: 175 seconds
- Virtual Machine: Xeon Platinum 8280 | 27 (of 28 ) cores @ 2.7GHz | 64GB Memory | Win10. Runtime: 614 seconds
As you can see, in the worst case the virtual machine (whose specs far outmatch, on paper, the desktop computer) runs 61.4x slower.
I followed the advice on this thread:
But that did not have much improvement at all.
I am looking for advice on why this may be happening and what I might be able to do to fix it!
Thank you in advance,
回答 (1 件)
Alison Eele 2022 年 8 月 12 日
Hi Chris, there are a unfortunately a few possibilities, your observation that the time is scaling with the number of workers on the machine is especially interesting since that suggests to me some sort of bottleneck.
If you observe the parpool starting on the virtual machine from something like Task Manager (Window) or top (Linux) do you see the additional MATLAB processes for the workers start quickly and reach a memory footprint close to about 800MB (Windows) or are the additional worker processes taking a while to start up?
Slow starting workers would suggest that the disk I/O needed to start a lot of MATLAB processes simultaneously might be the bottleneck or if you're seeing system resources spike close to 100% the machine is needing to work hard to start MATLAB.
If the processes are starting quickly do you see the parallel status indicator in your client MATLAB turn green and stay green for a long time before the pool is available to use?
If so then one of the setup steps for the parpool seems to be taking a while, this can be affected by the number of entries on the user's MATLAB path, especially anything that might include a network directory. I wouldn't expect this to scale with the number of workers so seems unlikely in your case.
If the processes take a long time to appear in Task Manager or top then that suggests to me the delay is in the parpool job creation and I'd be looking to see if the JobStorageLocation of the local profile is slow to write to.
Our support teams may be able to help you further in isolating the cause of the delay. https://www.mathworks.com/support/contact_us.html