Very slow loop trying to find any intersection

Hi Everyone,
I am trying to figure out if there is any intersection between a pair of observations in terms of partners that they have worked with. Jaccard_dyadic is the dyadic table in which the first two columns identify the observations (i.e. the pair that makes up the unique identifier). Then I am trying to fill row 'm' with the value 1, whenever both of the observations have worked with any of the same inventors (assignee_inventor is a matrix in which all of the observations are the rows, and inventors the columns, filled with a 1 whenever the observation of the corresponding row has worked with the inventor of the corresponding column). The complicated loop structure I have created below does exactly that - however, it is super slow. Any help of how to speed up this process would be much appreciated (I suspect that there is a much simpler way of doing this).
for i = 1:(find(jaccard_dyadic(:,1)==0, 1, 'first')-1)
for l=1:p(2)
if any(assignee_inventors(jaccard_dyadic(i,1),l)==assignee_inventors(jaccard_dyadic(i,2),l) && assignee_inventors(jaccard_dyadic(i,2),l)==1)
jaccard_dyadic(i,m)=1;
end
end
end
EDIT:
This is the whole code I am using. I have added some sample data. Given that the results are quite sparse, I hope that there are some instances of what I am looking for here. I haven't uploaded the way I want the output to be, but essentially it is just the last row of the jaccard_dyadic matrix (filled with zeros) that I want to take on the value 1 if there is any overlap as described above.
%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%Any Same Inventors
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
load('jaccard_dyadic_test.mat')
B = readmatrix('Inventor_copy.csv');
assignees = B(:,1);
inventors = B(:,2);
assignee_inventors=zeros(max(unique(B(:,1))), max(unique(B(:,2))));
empty_dim = size(B);
%%
for i=1:empty_dim(1)
assignee_inventors(assignees(i),inventors(i))=1;
end
%%
% actual code for what I need
p = size(assignee_inventors);
m = find(all(jaccard_dyadic==0), 1, 'first');
for i = 1:(find(jaccard_dyadic(:,1)==0, 1, 'first')-1)
for l=1:p(2)
if any(assignee_inventors(jaccard_dyadic(i,1),l)==assignee_inventors(jaccard_dyadic(i,2),l) && assignee_inventors(jaccard_dyadic(i,2),l)==1)
jaccard_dyadic(i,m)=1;
end
end
end
fprintf('After Inventors ');toc

3 件のコメント

Stephen23
Stephen23 2019 年 6 月 5 日
@John Kirk: please upload some sample input and output data in a .mat file, by clicking the paperclip button.
Jan
Jan 2019 年 6 月 5 日
@John: Did you pre-allocate the output? What is "m"?
John Kirk
John Kirk 2019 年 6 月 5 日
Hi you two, thanks for getting back to me.
I have pre-allocated the output to jaccard_dyadic, trying to fill the 6th column ('m' is just a generalized version of filling the first column that is empty). I have also added some example data in the edited post above.

サインインしてコメントする。

回答 (1 件)

Jan
Jan 2019 年 6 月 5 日
編集済み: Jan 2019 年 6 月 5 日

1 投票

A simplified version to get the overview:
dy = jaccard_dyadic;
in = assignee_inventors;
n = find(dy(:,1) == 0, 1, 'first') - 1;
for i = 1:n
for k = 1:p(2) % k is less confusing as l
if any(in(dy(i,1), k) == in(dy(i,2), k) && in(dy(i,2), k) == 1)
jaccard_dyadic(i, m) = 1;
end
end
end
What is m ? What is the purpose of the any()? For a scalar input you can omit the any() and write:
if in(dy(i,1), k) == in(dy(i,2), k) && in(dy(i,2), k) == 1
Isn't this the same as:
if in(dy(i,1), k) == 1 && in(dy(i,2), k) == 1
Which values can in contain? If it is only 0 or 1:
if in(dy(i,1), k) && in(dy(i,2), k)
Then your loop might be equivalent to:
jaccard_dyadic = assignee_inventors(dy(:, 1), 1:p(2)) & ...
assignee_inventors(dy(:, 2), 1:p(2));
Here I guess, that "m" is the inner loop counter. Maybe you need to add "==1" to both operands. replace the "1:p(2)" by a simple ":" if this matchs your needs.

3 件のコメント

John Kirk
John Kirk 2019 年 6 月 5 日
Thanks for the help @Jan! As might be obvious, I am somewhat new to Matlab and was googling my way through functions. It looks like I should be able to just use your suggested line (below) as the values can, in fact, only contain 0 or 1.
if in(dy(i,1), k) && in(dy(i,2), k)
I will try this out a little later and see if it works / is faster. Thanks already for the help.
John Kirk
John Kirk 2019 年 6 月 6 日
So I was able to change the loop to the iteration below and still get the same results essentially amending it using your second to last line of code. However, I am not quite sure how to implement your last suggestion saying that my loop might just be equivalent to jaccard_dyadic = ... Whenever I try varying implementations of that, I get an error that the index in position 1 is invalid.
Below is where I am at right now. It is slightly faster than my original concoction, but any further tips for improvement would still be appreciated.
p = size(assignee_inventors);
m = find(all(jaccard_dyadic==0), 1, 'first');
for i = 1:500 %(find(jaccard_dyadic(:,1)==0, 1, 'first')-1)
for k=1:p(2)
if assignee_inventors(jaccard_dyadic(i,1), k) && assignee_inventors(jaccard_dyadic(i,2), k)
jaccard_dyadic(i, m) = 1;
end
end
end
Jan
Jan 2019 年 6 月 6 日
編集済み: Jan 2019 年 6 月 6 日
After you have explained, that m is a constant, the inner loop can be omitted:
in = assignee_inventors; % Shorter names for nicer code
dy = jaccard_dyadic;
p = size(in);
m = find(all(dy==0), 1, 'first');
for i = 1:500
ja(i, m) = any(in(dy(i,1), :) & in(dy(i,2), :), 2);
end
The outer loop can be vectorized also:
ja(:, m) = any(in(dy(:, 1), :) & in(dy(:, 2), :), 2);
I'd prefer to test the code before posting. Therefore it is better to post some input data, e.g. created by rand.
"I get an error that the index in position 1 is invalid."
Please post a copy of the complete error message, not a rephrased version. Which index is meant? Which code did you try exactly? Post it, because it might contain a typo. Maybe your jaccard_dyadic has more elements than assignee_inventors and some elements are zero. You can check this easily.

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeLoops and Conditional Statements についてさらに検索

質問済み:

2019 年 6 月 4 日

編集済み:

Jan
2019 年 6 月 6 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by