Unable to submit task result (Matlab parallel server)

1 ビュー (過去 30 日間)
Maria
Maria 2021 年 12 月 2 日
回答済み: Raymond Norris 2021 年 12 月 2 日
Hi,
I am running some tests on a cluster. I create a job, and I submit several tasks. But, I get the following error
Error: Cannot rerun task because there are no rerun attempts left (The task has no rerun attempts left.).
Original cancel message:
java.lang.Exception: Unable to submit task result - MATLAB will now exit and restart.
Where shall I start to look at? What does practically this error mean? Is it a problem on the client side, or on the cluster side?

回答 (1 件)

Raymond Norris
Raymond Norris 2021 年 12 月 2 日
Hi Maria,
A few questions first:
  • Which platform is MATLAB Parallel Server running on, Linux or Windows?
  • Which scheduler are you using (MJS, PBS, etc.)?
  • What size pool are you running?
  • How many cores per node?
  • How much RAM per node?
If you're running non-MJS, try the following. I'll show using both batch and parpool.
setenv('MDCE_DEBUG','true')
cluster = parcluster;
% If you're using batch
job = cluster.batch();
job.wait
cluster.getDebug(job)
% If you're using parpool
pctconfig('preservejobs',true);
pool = cluster.parpool();
cluster.getDebug(cluster.Jobs(end))
If you're using MJS
mjs = parcluster;
mjs.ClusterLogLevel = 4;
% Call either batch or parpool
mjs.getClusterLogs()
Perhaps the log file will display something else. If I had to guess, I'm betting you're running out of memory.

製品


リリース

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by