How the labindex was assigned for the workers inside a node/machine in MDCS?

1 回表示 (過去 30 日間)
raym
raym 2018 年 5 月 25 日
編集済み: raym 2018 年 5 月 25 日
We know that in MDCS we can choose to create more than one workers inside a node/machine, say 4 workers per node/machine. So how the labindex was assigned for these 4 workers?Are thay always 1,2,3,4 for each node, or they are continuous increment node by node, such as 5-8, 9-12..., or they are totally random such as 1,3,9,6 for a node/,machine?

採用された回答

Edric Ellis
Edric Ellis 2018 年 5 月 25 日
You don't specify which cluster type you're using with MDCS, but I'm going to assume MJS for now. (Not all of what follows will be scheduler-specific).
labindex within an spmd context is equal to the task index executing on the worker. So, if you have 2 nodes each running 4 workers, and you run a single communicating job of size 8 (i.e. parpool('myMjsCluster', 8)), then the task indices are 1:8, as are the corresponding values of labindex.
MJS will endeavour to schedule things such that consecutive tasks are co-located on a single node - i.e. it will attempt to put tasks 1:4 on the first node, and 5:8 on the second. (Most other scheduler types will end up doing something similar, but by a different means).
Basically, what you need to do is come up with a mapping of labindex to hostname to work out which labs are located on which host, and then you can use that "local labindex" to pick which Java program to use. Here's one way.
spmd
[s, hostname] = system('hostname');
assert(s == 0, 'Failed to compute hostname');
hostname = strtrim(hostname);
% Get a list of all hostnames in the pool
allHostnames = gcat({hostname}, 1);
% Work out which labindex values are on this host
allLabs = 1:numlabs;
labsOnThisHost = allLabs(strcmp(hostname, allHostnames))
% Work out this lab's position among the labs on this host
myIndexOnThisHost = find(labindex == labsOnThisHost)
end
  1 件のコメント
raym
raym 2018 年 5 月 25 日
Yes, this is really awesome! This is really what I need. Thank you very much!

サインインしてコメントする。

その他の回答 (1 件)

Walter Roberson
Walter Roberson 2018 年 5 月 25 日
"The value of labindex spans from 1 to n, where n is the number of workers running the current job, defined by numlabs"
"This was done by pause a random seconds and then detect if there is ###.exe running in the tasklist of this node."
I would probably think in terms of having
if labindex == 1
check in case somehow external software is running
otherwise
launch external software
do any waiting for external software to be ready to go
end
end
labbarrier();
  1 件のコメント
raym
raym 2018 年 5 月 25 日
Thanks Roberson. Your code is really a better way to share the external software, but I am not sure if every machine has labindex 1. In fact that's the key of this question.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeParallel and Cloud についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by