Cheat for preallocation requirement

Question

Arda Aksu 2021 年 6 月 10 日

1
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/853365-cheat-for-preallocation-requirement

コメント済み: dpb 2021 年 6 月 11 日

Hi, I almost always use preallocation to make the code run faster. But recently I was faced with a problem where I couldn't use preallocation. Even creating a big array and then shrinking at the end was not possible to my use. So I needed a cheat. While trying some workaround solutions I realized preallocation is not such a problem for cell arrays as it is for numeric arrays. Below is an example code. I am wondering the reason behind this. How does Matlab handle memory management for cell arrays with different than numeric arrays?

tic
x=[];
for i=1:1e5
    x(end+1,:)=1:10;
end
toc
tic
c={};
for i=1:1e5
    c{end+1,1}=1:10;
end
x=zeros(numel(c),10);
for i=1:numel(c)
    x(i,:)=c{i};
end
toc
% Elapsed time is 10.064584 seconds.
% Elapsed time is 0.132362 seconds.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

dpb 2021 年 6 月 10 日

Cell arrays don't have to be contiguous as are native numeric data types.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Jan 2021 年 6 月 10 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/853365-cheat-for-preallocation-requirement#answer_722140

MATLAB Online で開く

x = [];
for i = 1:1e5
    x(end+1, :) = 1:10;
end

Here you allocate sum(1:1e5) * 10 * 8 bytes: 400 GB.

c = {};
for i=1:1e5
    c{end+1,1} = 1:10;
end

This allocates sum(1:1e5) * 8 bytes, because you need only one additional pointer per iteration. In addition, the elements of c point all to the same variable, which is shared:

format debug
c{1}
% Structure address = ebf72da0
% m = 1
% n = 10
% pr = f6ba8aa0
c{2}
% Structure address = ebf72da0
% m = 1
% n = 10
% pr = f6ba8aa0   <- the same address!

In the first code, the contents of the array is copied in each iteration, while in the cell method, all cells use the same data pointer, such that much less data have to be copied and less memery is occupied. On the other hand, each cell element has a header of about 100 bytes.

The JIT acceleration can cope with an omitted pre-allocation. This can reduce the dramatic slow down under some circumstances.

By the way, I cannot imagine a situation, where a pre-allocation is not possible. Could you post a minimal working example? At least allocating in steps of 1000 should be possible.

9 件のコメント
7 件の古いコメントを表示7 件の古いコメントを非表示

dpb 2021 年 6 月 10 日

編集済み: dpb 2021 年 6 月 10 日

MATLAB Online で開く

This isn't quite right -- the actual data allocation is also in the cell array; only it isn't stored contiguously so the whole array doesn't have to get reallocated.

The memory alloted to c after the end is

> whos c
  Name           Size               Bytes  Class    Attributes
  c         100000x1             18400000  cell               
>> 1E5*8
ans =
      800000
>> res=whos('c')
res = 
  struct with fields:
          name: 'c'
          size: [100000 1]
         bytes: 18400000
         class: 'cell'
        global: 0
        sparse: 0
       complex: 0
       nesting: [1×1 struct]
    persistent: 0
>> res.bytes/res.size(1)
ans =
   184
>> 

or

Mtotal=numel(c)*[104+8*numel(v)]

where v is the vector stored (1:10 here). If instead v were 1:120, the storage of both c and x would go up by 80*1E5 bytes.

>> (res.bytes-res.size(1)*104)==resX.bytes
ans =
  logical
   1
>> 

where resX was the result from

resX=whos('x');

for which, of course, the answer is 8000000

In short, all the memory is allocated (although the JIT compiler might be smart-enough to do the lazy allocation, I don't know).

I didn't try to time the case where the stored vector is different each iteration.

I may be misinterpreting, but I think it is just that a cell array can put the little chunk of memory anywhere instead of having to reallocate the whole array as a contiguous block for the native type.

The 104 bytes of overhead for each 80 byte chunk is the price paid for the speed; as the size of the data stored goes up, the relative penalty goes down; it's >2:1 in this example.

Jan 2021 年 6 月 10 日

MATLAB Online で開く

Thanks for this valuable comment. I'm expecting that the discussion is going on. Your explanation of the penalty of the 104 bytes overhead are a useful extension for my answer.

Did you consider that all elements of c point to the same data? This is shown by the format debug method in my answer. WHOS does not consider this.

In addition the data block of the cell array contains only pointers to the arrays. Then appending an element to the cell array needs to copy 8 bytes * numel(c) per iteration, while the growing matrix copies 8 bytes * 10 * size(x, 1).

Some experiments:

mem1 = memory;
c = cell(1, 1000);  % Here pre-allocation does not matter
v = 1:1e5;
for k = 1:1000
   c{k} = v;
end
mem2 = memory;
used = mem2.MemUsedMATLAB - mem1.MemUsedMATLAB

Expected: 1000 * (104 + 1e5*8) + 104 + 1000 * 8 = 800 MB

Used: Phew, this is less straight then I hoped. Sometimes it is negative.

Other observations:

c = {};
for k = 1:1000     % |format debug| shows:
   c{k} = 1:10;    % All c{:} point to the same data
end
c = {};
for k = 1:1000
   c{k} = 1:1e5;   % But not here! c{:} are not shared data copies
end
c = {};
v = 1:1e5
for k = 1:1000
   c{k} = v;       % Now they are shared again.
end

In addition the last one is much faster.

Walter Roberson 2021 年 6 月 11 日

MATLAB Online で開く

It was only a small number of releases ago that MATLAB changed so that small literal arrays have their data shared. In previous versions, if you did

for K = 1 : 5
  C{K} = 1 : 10;
end

then a different memory block would be created for each iteration of the loop. With current versions, if the literal is sufficiently small then the same memory pointer will be used.

If I recall correctly, the same thing can happen across different lines, but that the literal has to have been written exactly the same way. For example,

for K = 1 : 5
  C1{K} = 1 : 10;
end
for K = 1 : 5
  C2{K} = 1 : 10;
end
for K = 1 : 5
  C3{K} = 1:10;
end

All the C1 and all the C2 would share the same [1 2 3 4 5 6 7 8 9 10], but with the 1:10 having been written differently, it would not be recognized as being the same. Spacing counts. Including, IIRC, spacing before comments if there was no semi-colon terminator.

James Tursa posted about all of this, as he noticed that it affected in-place operations. The headers changed and the existance of the background "master" copy of the literal was hidden relative to techniques that were fine before to detect whether it was safe to write to a block. He posted code that showed that an in-place modification to C1{K} could affect a different place where a literal had been assigned. It is like if you had done

A = 1;
B = 1;
A = inplace_assign(A, 2)       %hypothetical routine

then suddenly you would find that B was 2 now as well because it was sharing the hidden copy of the literal.

dpb 2021 年 6 月 11 日

"...n explanation for the timings measured in the original question?"

"They are what they are." <VBG>

It is notoriously hard to write meaningful timing routines and without looking inside the actual data structures much more carefully than I have the time (nor inclination} to do, I don't have any further real insight, no, sorry.

Have you tried to construct a test in which the data are not duplicated but that doesn't spend as much or more time in calculating the new data to see what difference that makes in what might be more of a real use case?

The only other thing I would ask would be what results do you get on your machine if you duplicate the OP's test? Do you get his 100X difference, too? it's possible, I think, such could give wildly different results from machine to machine depending on all the possible differences in memory sizes, memory cache size, etc., etc., etc., ...

I didn't mess with the cell array but did a mini-comparison of the native data type by number of iterations over a few orders of magnitude from 1E3 thru 1E6 and in comparison OPs machine is about 20X as fast as my cheapie, old-enough it's still Win7, so I think timings could be all over the place.

James Tursa 2021 年 6 月 11 日

"Do you have an explanation for the timings measured in the original question?"

Yes. As Jan pointed out, the first method copies and re-copies the data over and over and over again. The second method doesn't.

dpb 2021 年 6 月 11 日

Well, yes...the Q? I interpreted as was about insight re: Jan's expectation of 10:1 vis a vis the OP's observed 100:1 ratio. Why on that, I've no klew, starting with I don't know why/how Jan derived his "expected" factor of 10.

サインインしてコメントする。

Cheat for preallocation requirement

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

回答 (1 件)

9 件のコメント
7 件の古いコメントを表示7 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Cheat for preallocation requirement

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

回答 (1 件)

9 件のコメント 7 件の古いコメントを表示7 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

9 件のコメント
7 件の古いコメントを表示7 件の古いコメントを非表示