speed up intersect, setdiff functions

Hi
I have some code, which when I examine using the Matlab Profiler shows that the majority of the time is spent on the intersect and setdiff functions, which are invoked many times during a looping process. I was wondering if there was a clever way to speed this up? For example is it possible to somehow vectorize the following code to avoid invoking the inbuilt set functions? Thanks for your help!
tic;
size=50000;
a=round(100*rand(size,1));
b=round(100*rand(size,1));
intersect(a,b);
setdiff(a,b);
toc;

1 件のコメント

Matt Fig
Matt Fig 2012 年 11 月 15 日
Your best bet might be to avoid performing these operations in a loop in the first place...

サインインしてコメントする。

 採用された回答

Jan
Jan 2012 年 11 月 15 日

1 投票

intersect and setdiff both sort the inputs. Therefore you can omit at least one sorting, when you copy the relevant parts of both function to a new file. Then ismembc will do its work once, as Sean has suggested already.

その他の回答 (2 件)

Jonathan Epperl
Jonathan Epperl 2012 年 11 月 15 日

7 投票

If your sets aren't much larger than your example, and are indeed made up of positive integers, then those two functions here can get you a drastic increase in speed. The idea isn't mine, afaik Kevin Murphy of the Bayes Net TB came up with it. Anyway, 2 functions which pretty much exactly replace setdiff and intersect (as long as you work with positive integers):
function C = MY_intersect(A,B)
if ~isempty(A)&&~isempty(B)
P = zeros(1, max(max(A),max(B)) ) ;
P(A) = 1;
C = B(logical(P(B)));
else
C = [];
end
and
function Z = MY_setdiff(X,Y)
if ~isempty(X)&&~isempty(Y)
check = false(1, max(max(X), max(Y)));
check(X) = true;
check(Y) = false;
Z = X(check(X));
else
Z = X;
end
Obviously, if you're confident your inputs will make sense you can get rid of the if stuff.
tic;
size=50000;
a=round(100*rand(size,1));
b=round(100*rand(size,1));
intersect(a,b);
setdiff(a,b);
toc;
% Elapsed time is 0.028182 seconds.
tic;
size=50000;
a=round(100*rand(size,1));
b=round(100*rand(size,1));
MY_intersect(1+a,1+b)-1;
MY_setdiff(1+a,1+b)-1;
toc;
% Elapsed time is 0.016935 seconds.
The computation times vary a lot though, but the 2nd block is always at least around 30% faster. You can see how it plays out in your code and let us know.

3 件のコメント

Jdeen
Jdeen 2014 年 5 月 21 日
編集済み: Jdeen 2014 年 5 月 21 日
Unfortunately, MY_setdiff does not quite behave the same as setdiff, also not for positive integers. Consider: setdiff([1 2 2 1], [1]) = 2 and MY_setdiff([1 2 2 1], [1]) = 2 2. Hence, you would have to include a Z = unique(Z) which reduces speed. However, MY_setdiff generally still seems to be faster for pos. integers
Jonathan Epperl
Jonathan Epperl 2014 年 8 月 18 日
That's a good point, I never realized that.
Chinmaya KA
Chinmaya KA 2020 年 9 月 9 日
Hi Jonathan, how can we intersect multiple arrays with positive integers using your code?

サインインしてコメントする。

Sean de Wolski
Sean de Wolski 2012 年 11 月 15 日

0 投票

Look at ismember and ismembc (two output form).
You will have to implement the and/or/xor logic yourself, but these should quickly give you the indices you need to do so.

1 件のコメント

Ethan Kyzivat
Ethan Kyzivat 2018 年 5 月 9 日
I found that ismember also calls unique(), so may not be much faster...

サインインしてコメントする。

カテゴリ

質問済み:

2012 年 11 月 15 日

コメント済み:

2020 年 9 月 9 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by