Memory spike when using previously declared variable in spmd block

Question

3 投票

I have a piece of code where a few (reasonably) large vectors are initialized inside a script, before they are used in an spmd block. Almost all the elements of the vectors are used on each worker, so I really need for the whole vector to be available on each worker. The code is similar to the below code in structure.

matlabpool open local 4
a = rand(1+2^27,1); % 1 GB, in my code this is not a random vector
spmd,
  b = a(1); % my code does some more involved work here,
            % but this still illustrates my problem.
end

After having executed this code the total memory usage is close to 6GBs, as expected. However, when the spmd block is started the memory usage spikes to between 8 and 10 GBs. I figure this has something to do with transmitting the variable 'a', but I fail to understand why the spike is so large.

After looking through the questions here, and reading the PCT documentation, I am still drawing a blank.

I have two concrete questions:

Can somebody explain what the cause of the spike in memory usage is?
Is there a way to distribute the variables without getting this memory spike, or at least reduce it?

I am aware of distributed arrays, but the communication overhead of using distributed arrays for this is too large for the tests I have done. However, I am naturally open for any suggestions that involve distributed arrays as well; I do, after all, not pronounce myself an expert in PCT.

In addition, if I change the assignment above to 'rand(2^28,1)' I get the following error:

Error using distcompserialize
Error during serialization
Error in spmdlang.RemoteSpmdExecutor/initiateComputation (line 82)
                fcns  = distcompMakeByteBufferHandle( ...
Error in spmdlang.spmd_feval_impl (line 14)
    blockExecutor.initiateComputation();
Error in spmd_feval (line 8)
        spmdlang.spmd_feval_impl( varargin{:} );

My hope is that maybe an answer to my 2. question can rid me of this error message as well?

If it matters I am using MATLAB 2011b and PCT version 5.2. in Linux.

Thank you for your time.

Anders.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

サインインしてアクティビティをフォロー

Answer 1

Edric Ellis 2012 年 3 月 28 日

MATLAB Online で開く

4 投票

One more thing, if you need the same data on each worker, you could also do this:

    c = Composite();
    c{1} = getMyLargeData();
    c(2:end) = cell(1, numel(c) - 1);
    spmd
        c = labBroadcast( 1, c );
        % use c
    end

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

Konrad Malkowski 2012 年 3 月 28 日

MATLAB Online で開く

3 投票

You can build the random vector directly on the workers:

spmd
  c = codistributed.rand(1+2^27, 1);
end

As for the spike in memory usage that you are seeing. Without getting too much in details of implementation, it is caused by send and receive buffers on both the client MATLAB (the one you are interacting with), and the worker MATLABs (MATLABPOOL Workers). You will have on buffer on the client, and one buffer per worker.

The reason for the second error is that at the moment there is a serialization limit of 2GB for communications between client and workers.

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

Anders Hoff 2012 年 3 月 28 日

thanks for your answer.

i realize that i wasn't clear on this in my question, but i am not actually initializing a random vector. the vector contains something completely different. hence, building a codistributed random matrix like you suggest is sadly not an option.

thanks for explaining the spikes. i had a feeling it was something to do with memory buffers.

サインインしてコメントする。

Answer 3

Edric Ellis 2012 年 3 月 28 日

MATLAB Online で開く

3 投票

If you need to build data from client-side, you can use the explicit Composite method. Something like this:

    c = Composite();
    for ii = 1:numel(c)
        c{ii} = getMyLargeData(ii);
    end
    spmd
        % use 'c'
    end

This is the most memory efficient way to do things as it sends only the required data to each worker. Konrad's explanation tells you why you are seeing the memory spike doing things the other way.

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

Matti Kummu 2017 年 1 月 17 日

Thank you; this helped a lot!

サインインしてコメントする。

Answer 4

Thomas Lai 2012 年 6 月 7 日

0 投票

Hi Konrad, is there any way to get around the serialization limit of 2GB? Because, 2GB is much too small for any significant large datasets that I'm working with.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Answer 5

Henryk Modzelewski 2013 年 3 月 23 日

0 投票

Is there a way to increase serialization limit of 2GB for communications between client and workers? 2GB is ridiculously low for big data sizes.

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

Edric Ellis 2013 年 3 月 28 日

This restriction was removed in R2013a.

サインインしてコメントする。

Memory spike when using previously declared variable in spmd block

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

その他の回答 (4 件)

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

カテゴリ

製品

タグ

Community Treasure Hunt

Memory spike when using previously declared variable in spmd block

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

その他の回答 (4 件)

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

カテゴリ

製品

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示