Cheat for preallocation requirement

Hi, I almost always use preallocation to make the code run faster. But recently I was faced with a problem where I couldn't use preallocation. Even creating a big array and then shrinking at the end was not possible to my use. So I needed a cheat. While trying some workaround solutions I realized preallocation is not such a problem for cell arrays as it is for numeric arrays. Below is an example code. I am wondering the reason behind this. How does Matlab handle memory management for cell arrays with different than numeric arrays?
tic
x=[];
for i=1:1e5
x(end+1,:)=1:10;
end
toc
tic
c={};
for i=1:1e5
c{end+1,1}=1:10;
end
x=zeros(numel(c),10);
for i=1:numel(c)
x(i,:)=c{i};
end
toc
% Elapsed time is 10.064584 seconds.
% Elapsed time is 0.132362 seconds.

1 件のコメント

dpb
dpb 2021 年 6 月 10 日
Cell arrays don't have to be contiguous as are native numeric data types.

サインインしてコメントする。

回答 (1 件)

Jan
Jan 2021 年 6 月 10 日

0 投票

x = [];
for i = 1:1e5
x(end+1, :) = 1:10;
end
Here you allocate sum(1:1e5) * 10 * 8 bytes: 400 GB.
c = {};
for i=1:1e5
c{end+1,1} = 1:10;
end
This allocates sum(1:1e5) * 8 bytes, because you need only one additional pointer per iteration. In addition, the elements of c point all to the same variable, which is shared:
format debug
c{1}
% Structure address = ebf72da0
% m = 1
% n = 10
% pr = f6ba8aa0
c{2}
% Structure address = ebf72da0
% m = 1
% n = 10
% pr = f6ba8aa0 <- the same address!
In the first code, the contents of the array is copied in each iteration, while in the cell method, all cells use the same data pointer, such that much less data have to be copied and less memery is occupied. On the other hand, each cell element has a header of about 100 bytes.
The JIT acceleration can cope with an omitted pre-allocation. This can reduce the dramatic slow down under some circumstances.
By the way, I cannot imagine a situation, where a pre-allocation is not possible. Could you post a minimal working example? At least allocating in steps of 1000 should be possible.

9 件のコメント

dpb
dpb 2021 年 6 月 10 日
編集済み: dpb 2021 年 6 月 10 日
This isn't quite right -- the actual data allocation is also in the cell array; only it isn't stored contiguously so the whole array doesn't have to get reallocated.
The memory alloted to c after the end is
> whos c
Name Size Bytes Class Attributes
c 100000x1 18400000 cell
>> 1E5*8
ans =
800000
>> res=whos('c')
res =
struct with fields:
name: 'c'
size: [100000 1]
bytes: 18400000
class: 'cell'
global: 0
sparse: 0
complex: 0
nesting: [1×1 struct]
persistent: 0
>> res.bytes/res.size(1)
ans =
184
>>
or
Mtotal=numel(c)*[104+8*numel(v)]
where v is the vector stored (1:10 here). If instead v were 1:120, the storage of both c and x would go up by 80*1E5 bytes.
>> (res.bytes-res.size(1)*104)==resX.bytes
ans =
logical
1
>>
where resX was the result from
resX=whos('x');
for which, of course, the answer is 8000000
In short, all the memory is allocated (although the JIT compiler might be smart-enough to do the lazy allocation, I don't know).
I didn't try to time the case where the stored vector is different each iteration.
I may be misinterpreting, but I think it is just that a cell array can put the little chunk of memory anywhere instead of having to reallocate the whole array as a contiguous block for the native type.
The 104 bytes of overhead for each 80 byte chunk is the price paid for the speed; as the size of the data stored goes up, the relative penalty goes down; it's >2:1 in this example.
Jan
Jan 2021 年 6 月 10 日
Thanks for this valuable comment. I'm expecting that the discussion is going on. Your explanation of the penalty of the 104 bytes overhead are a useful extension for my answer.
Did you consider that all elements of c point to the same data? This is shown by the format debug method in my answer. WHOS does not consider this.
In addition the data block of the cell array contains only pointers to the arrays. Then appending an element to the cell array needs to copy 8 bytes * numel(c) per iteration, while the growing matrix copies 8 bytes * 10 * size(x, 1).
Some experiments:
mem1 = memory;
c = cell(1, 1000); % Here pre-allocation does not matter
v = 1:1e5;
for k = 1:1000
c{k} = v;
end
mem2 = memory;
used = mem2.MemUsedMATLAB - mem1.MemUsedMATLAB
Expected: 1000 * (104 + 1e5*8) + 104 + 1000 * 8 = 800 MB
Used: Phew, this is less straight then I hoped. Sometimes it is negative.
Other observations:
c = {};
for k = 1:1000 % |format debug| shows:
c{k} = 1:10; % All c{:} point to the same data
end
c = {};
for k = 1:1000
c{k} = 1:1e5; % But not here! c{:} are not shared data copies
end
c = {};
v = 1:1e5
for k = 1:1000
c{k} = v; % Now they are shared again.
end
In addition the last one is much faster.
dpb
dpb 2021 年 6 月 10 日
編集済み: dpb 2021 年 6 月 11 日
As said, I didn't look at whether JIT engine is smart enough to use the lazy allocation by recognizing the constant vector; the point to me is moot--one generally doesn't make 1E4 or 1E5 copies of the same data so in general the storage has to be allocated somewhere; in the case of a native data type it has to be contiguous memory to make everything else work as expected; the cell array can put it wherever and, more importantly, doesn't have to reallocate the large block over and over which also entails a memcopy.
With modern CPUs with cache and OS disk-swapping, one can't even count on the allocation being in (physical) memory all the time. Hence the difficulties you saw in that vein.
Walter Roberson
Walter Roberson 2021 年 6 月 11 日
It was only a small number of releases ago that MATLAB changed so that small literal arrays have their data shared. In previous versions, if you did
for K = 1 : 5
C{K} = 1 : 10;
end
then a different memory block would be created for each iteration of the loop. With current versions, if the literal is sufficiently small then the same memory pointer will be used.
If I recall correctly, the same thing can happen across different lines, but that the literal has to have been written exactly the same way. For example,
for K = 1 : 5
C1{K} = 1 : 10;
end
for K = 1 : 5
C2{K} = 1 : 10;
end
for K = 1 : 5
C3{K} = 1:10;
end
All the C1 and all the C2 would share the same [1 2 3 4 5 6 7 8 9 10], but with the 1:10 having been written differently, it would not be recognized as being the same. Spacing counts. Including, IIRC, spacing before comments if there was no semi-colon terminator.
James Tursa posted about all of this, as he noticed that it affected in-place operations. The headers changed and the existance of the background "master" copy of the literal was hidden relative to techniques that were fine before to detect whether it was safe to write to a block. He posted code that showed that an in-place modification to C1{K} could affect a different place where a literal had been assigned. It is like if you had done
A = 1;
B = 1;
A = inplace_assign(A, 2) %hypothetical routine
then suddenly you would find that B was 2 now as well because it was sharing the hidden copy of the literal.
dpb
dpb 2021 年 6 月 11 日
BTW, Jan, thanks for the format debug output -- I wasn't aware of that option being available. It's undocumented AFAICT.
Jan
Jan 2021 年 6 月 11 日
@Walter Robersonand @dpb: Do you have an explanation for the timings measured in the original question? Of course the JIT is a magic factor in this discussion, because it is not documented on purpose.
dpb's argument is important: Storing the data as vectors in a cell does not allocate a contiguos block of data and does not have to duplicate the actual data. But on the other hand the 104 byte overhead per element are expensive. Finally it matters, that these headers are not moved in the memory when the cell array is growing iteratively, because the cell array contains a list of pointers to its elements only. At least this list must be store contigously in the RAM, but this needs 8 byte per element only.
After some experiments I still expect the cell to be 10 times faster than the numerical array, but the OP found:
% Elapsed time is 10.064584 seconds.
% Elapsed time is 0.132362 seconds.
dpb
dpb 2021 年 6 月 11 日
"...n explanation for the timings measured in the original question?"
"They are what they are." <VBG>
It is notoriously hard to write meaningful timing routines and without looking inside the actual data structures much more carefully than I have the time (nor inclination} to do, I don't have any further real insight, no, sorry.
Have you tried to construct a test in which the data are not duplicated but that doesn't spend as much or more time in calculating the new data to see what difference that makes in what might be more of a real use case?
The only other thing I would ask would be what results do you get on your machine if you duplicate the OP's test? Do you get his 100X difference, too? it's possible, I think, such could give wildly different results from machine to machine depending on all the possible differences in memory sizes, memory cache size, etc., etc., etc., ...
I didn't mess with the cell array but did a mini-comparison of the native data type by number of iterations over a few orders of magnitude from 1E3 thru 1E6 and in comparison OPs machine is about 20X as fast as my cheapie, old-enough it's still Win7, so I think timings could be all over the place.
James Tursa
James Tursa 2021 年 6 月 11 日
"Do you have an explanation for the timings measured in the original question?"
Yes. As Jan pointed out, the first method copies and re-copies the data over and over and over again. The second method doesn't.
dpb
dpb 2021 年 6 月 11 日
Well, yes...the Q? I interpreted as was about insight re: Jan's expectation of 10:1 vis a vis the OP's observed 100:1 ratio. Why on that, I've no klew, starting with I don't know why/how Jan derived his "expected" factor of 10.

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeMATLAB についてさらに検索

製品

リリース

R2021a

タグ

質問済み:

2021 年 6 月 10 日

コメント済み:

dpb
2021 年 6 月 11 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by