Why is sort() taking longer to resolve on multiples of 512?

1 回表示 (過去 30 日間)
Magnus
Magnus 2014 年 9 月 8 日
編集済み: Magnus 2014 年 9 月 9 日
I am using the sort() funtion to sort the columns of massive matrixes. By chance, I discovered that the sort algorithm took a lot longer than expected to run when I used a matrix with 256^3 columns than when using 255^3. After some experimentation I was doumbfunded to discover that using 257^3 columns was also much faster than 256^3. By now, I was really confused, so I i did a little experiment:
for i = 1:10000
test = rand(i,100);
tstart = tic;
sort(test,2);
out(i) = toc(tstart);
end
plot(1:10000,out)
ylabel('Time elapsed/s')
xlabel('nbr of columns')
That gave me this little nice plot, which looks something like an emission spectrum:
The major peaks seem to be at multiples of 512, i.e. 512,1024,1536,2048 etc. Can anyone explain this behaviour? As the peaks are at familiar "digital"-numbers I am guessing this has something to do with the algorithm-implementation.

回答 (3 件)

Sean de Wolski
Sean de Wolski 2014 年 9 月 8 日
編集済み: per isakson 2014 年 9 月 8 日
I don't think sort has anything to do with the behavior you are seeing. Look at what happens if you preallocate out
out = zeros(1000,1);
for i = 1:1000
test = rand(i,100);
tstart = tic;
sort(test,2);
out(i) = toc(tstart);
end
plot(1:1000,out)
ylabel('Time elapsed/s')
xlabel('nbr of columns')
Now the out array is not growing on each iteration, you do not see spikes (just expected fluctuations from sorting random numbers).
More information here:
  3 件のコメント
Sean de Wolski
Sean de Wolski 2014 年 9 月 8 日
What do you have for a machine?
Magnus
Magnus 2014 年 9 月 8 日
i7-3960X 3.30GHz CPU and 64 GB RAM running Win 7pro 64-bit

サインインしてコメントする。


per isakson
per isakson 2014 年 9 月 8 日
編集済み: per isakson 2014 年 9 月 8 日
I have reproduced your result with R2013a,64bit,Win7,8GB on a five year old vanilla desktop (Processor Intel® Core™2 Quad CPU Q9400 @ 2.66GHz 2.66 GHz).
out(1,10000) = nan;
for ii = 1:10000
test = rand(ii,100);
tstart = tic;
sort(test,2);
out(ii) = toc(tstart);
end
figure
plot(1:10000,out)
ylabel('Time elapsed/s')
xlabel('nbr of columns')
ff = filtfilt( fir1( 20, 0.03), 1, out );
find( (out-ff) > 0.5*ff )/256
ans =
Columns 1 through 9
1.0000 2.0000 3.0000 3.6680 4.0000 5.0000 5.7695 6.0000 6.5078
Columns 10 through 18
7.0000 7.5000 8.0000 8.0898 9.0000 9.6680 10.0000 11.0000 12.0000
Columns 19 through 27
13.0000 14.0000 15.0000 16.0000 17.0000 18.0000 19.0000 20.0000 21.0000
Columns 28 through 36
22.0000 23.0000 24.0000 25.0000 26.0000 27.0000 28.0000 29.0000 30.0000
Columns 37 through 45
31.0000 32.0000 33.0000 34.0000 35.0000 36.0000 37.0000 38.0000 39.0000

José-Luis
José-Luis 2014 年 9 月 8 日
編集済み: José-Luis 2014 年 9 月 8 日
Maybe it has something to do with data padding. If you use single precision format, then the jumps come at multiples of 1024. (Shamelessly plagiarizing Per's code):
out(1,10000) = nan;
for ii = 1:10000
test = single(rand(ii,100)); %just changed this
tstart = tic;
sort(test,2);
out(ii) = toc(tstart);
end
figure
plot(1:10000,out)
ylabel('Time elapsed/s')
xlabel('nbr of columns')
ff = filtfilt( fir1( 20, 0.03), 1, out );
find( (out-ff) > 0.5*ff )/256
EDIT
On second thought, it might have nothing to do with Matlab but with your processor's cache size. Maybe 512 bytes happens to be a multiple of some cache-level size and when you fill it completely, it provokes many cache misses and therefore a jump in decreasing performance.
To find the culprit would be difficult without knowing how sort() is implemented, and the results might be different from processor to processor.
  2 件のコメント
Adam
Adam 2014 年 9 月 8 日
The same spike seems to occur with the median or even min or max functions too
Magnus
Magnus 2014 年 9 月 9 日
編集済み: Magnus 2014 年 9 月 9 日
@José Yes, that is what I am thinking too, but it seems a bit strange. I mean, basically the sort algorithm as used in my example sorts a number of independent vectors of data, stored as columns in a matrix. Sorting 257 such columns takes much less time than sorting 256 columns. Well, I agree, probably this is some low-level issue which is difficult for us mortals to pin down.
After analyzing the data in more detail it seems that the jump in processing time is occuring (to a greater or lesser degree) at every multiple of 64, with multiples of 512 and especially 1024 being extra prominent.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeLogical についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by