Permutations Cycles Counting ... speed up

2 ビュー (過去 30 日間)
Michal
Michal 2017 年 10 月 20 日
編集済み: Jan 2017 年 10 月 23 日
Hi, this is my code for permutation cycles counting. Any idea how to vectorize it to get some additional speedup for large set of permutations?
function count = PermCyclesCount(p)
[nR,nE] = size(p);
count = zeros(nR,1);
for j = 1:nR
i = 0;
count(j) = 0;
while i < nE
i = i + 1;
if p(j,i) > 0
flag = false; % flag is just to trick the computer into entering the while loop
first = i;
while (first ~= i) || ~flag
flag = true;
r = first;
first = p(j,first);
p(j,r) = 0; % The entries in the cycle are changed to zero to indicate that they have been `used'
end
count(j) = count(j) + 1;
end
end
end
end
Some testing data:
[~,perms] = sort(rand(1000000,60),2);
tic;c= PermCyclesCount(perms);toc
Elapsed time is 1.559758 seconds.

採用された回答

Jan
Jan 2017 年 10 月 20 日
編集済み: Jan 2017 年 10 月 20 日
% VERSION 1
[nR, nE] = size(p);
count = zeros(nR, 1);
for j = 1:nR
q = p(j, :); % <== Cut out a vector to operate on
i = 0;
while i < nE
i = i + 1;
if q(i) > 0
flag = false;
first = i;
while (first ~= i) || ~flag
flag = true;
r = first;
first = q(first);
q(r) = 0;
end
count(j) = count(j) + 1;
end
end
end
This tiny change let the code run in 1.41 instead of 2.37 sec on my computer.
And this in 1.25 sec without the flag:
% VERSION 2
[nR, nE] = size(p);
count = zeros(nR, 1);
for j = 1:nR
q = p(j, :);
for i = 1:nE
if q(i) > 0
f = q(i);
q(i) = 0;
while f ~= i
r = f;
f = q(f);
q(r) = 0;
end
count(j) = count(j) + 1;
end
end
end
  3 件のコメント
Jan
Jan 2017 年 10 月 20 日
編集済み: Jan 2017 年 10 月 20 日
This code is 20% faster, if the data a provided in columnwise order:
[~,perms] = sort(rand(60, 1000000), 1);
% VERSION 2.1
[nE, nE] = size(p); % <== Swapped
count = zeros(nR, 1);
for j = 1:nR
q = p(:, j); % <== Copy column
for i = 1:nE
if q(i) > 0
f = q(i);
q(i) = 0;
while f ~= i
r = f;
f = q(f);
q(r) = 0;
end
count(j) = count(j) + 1;
end
end
end
The C code is even 33% faster - see in the other answer.
Does the creation of perms matter also? Then:
[~,perms] = sort(rand(60, 1000000),1); % 4.88 sec
perms = Shuffle(repmat((1:60).', 1, 1000000), 1); % 2.95 sec
Michal
Michal 2017 年 10 月 23 日
One again Jan, tahnk you very much for your help. Creation is not a critical part, but anyway, the Shuffle function looks very useful for me, too.

サインインしてコメントする。

その他の回答 (3 件)

Jan
Jan 2017 年 10 月 20 日
編集済み: Jan 2017 年 10 月 23 日
A C-Mex has about the double speed of the VERSION 2:
// VERSION 3
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
size_t nR, nE, j, i;
double *c, *p, *pr;
int *q, f, r;
// Get input:
nR = mxGetM(prhs[0]);
nE = mxGetN(prhs[0]);
p = mxGetPr(prhs[0]);
// Create output:
plhs[0] = mxCreateDoubleMatrix((mwSize) nR, 1, mxREAL);
c = mxGetPr(plhs[0]);
q = (int *) mxMalloc((nE + 1) * sizeof(int));
for (j = 0; j < nR; j++) {
// Copy row p(j, :) to q:
pr = p++;
for (i = 1; i <= nE; i++) {
q[i] = (int) (*pr); // One based indexing for q
pr += nR;
}
for (i = 1; i <= nE; i++) {
if (q[i] != 0) {
f = q[i];
q[i] = 0;
while (f != i) {
r = f;
f = q[f];
q[r] = 0;
}
*c += 1.0;
}
}
c++;
}
mxFree(q);
return;
}
This is 33% faster, if you provide the input in columnwise order instead:
[~,perms] = sort(rand(60, 1000000),1);
// VERSION 3.1
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
// !!! FOR [nE x nR] !!!
size_t nR, nE, j, i;
double *c, *p, *pr;
int *q, f, r, ic;
// Get input:
nE = mxGetM(prhs[0]);
nR = mxGetN(prhs[0]);
p = mxGetPr(prhs[0]);
// Create output:
plhs[0] = mxCreateDoubleMatrix((mwSize) nR, 1, mxREAL);
c = mxGetPr(plhs[0]);
q = (int *) mxMalloc((nE + 1) * sizeof(int));
for (j = 0; j < nR; j++) {
// Copy row p(j, :) to q:
pr = p;
for (i = 1; i <= nE; i++) {
q[i] = (int) (*pr++); // One based indexing for q
}
ic = 0;
for (i = 1; i <= nE; i++) {
if (q[i] != 0) {
f = q[i];
q[i] = 0;
while (f != i) {
r = f;
f = q[f];
q[r] = 0;
}
ic++;
}
}
*c++ = (double) ic;
pr += nE;
}
mxFree(q);
return;
}
  1 件のコメント
Michal
Michal 2017 年 10 月 23 日
Jan, thanks a lot for your extraordinary help!!! The codes you provide here are perfect. Especially attached MEX files.

サインインしてコメントする。


Jos (10584)
Jos (10584) 2017 年 10 月 20 日
Not vectorised, but less checking, not sure if it is faster
nR = size(p,1) ;
count2 = zeros(nR,1) ;
tf = ~isnan(p) ;
for j = 1:nR
r1 = find(tf(j,:), 1 ,'first') ; % find the first cycle
while ~isempty(r1) % is there a cycle
count2(j) = count2(j) + 1 ;
while tf(j,r1)
tf(j,r1) = false ;
r2 = p(j,r1) ;
r1 = r2 ;
end
r1 = find(tf(j,:),1,'first') ; % find the next cycle
end
end
  9 件のコメント
Jan
Jan 2017 年 10 月 20 日
The profiler disables the JIT acceleration, which could re-order the code lines to improve speed. This cannot be allowed, if the time per line should be measured.
But the JIT accelerator is essential for the run time of such loops. Therefore the profiler has a limited use here. You can determine the commands which take the most time, but not the code blocks.
Michal
Michal 2017 年 10 月 20 日
@Jan Simon Yes, you are right. The profiler has a limited use, now!!! But frankly speaking, this limitation is really terrible.

サインインしてコメントする。


Jos (10584)
Jos (10584) 2017 年 10 月 20 日
編集済み: Jos (10584) 2017 年 10 月 20 日
This is upto 10 times faster than your code (and indeed about 100 times faster than my previous code):
[nR, nE] = size(p) ;
count3 = zeros(nR,1) ;
tf = true(nR,nE) ;
for Ri = 1:nR
for Ei = 1:nE,
if tf(Ri,Ei)
count3(Ri) = count3(Ri) + 1 ;
while tf(Ri,Ei)
tf(Ri,Ei) = false ;
r = p(Ri,Ei) ;
Ei = r ;
end
end
end
end
  2 件のコメント
Michal
Michal 2017 年 10 月 20 日
This code is definitely still a bit slower than mine :)
[~,perms] = sort(rand(1000000,60),2);
My code:
tic;c= PermCyclesCount(perms);toc
Elapsed time is 1.552508 seconds.
Your code:
tic;c_= PermCyclesCount_(perms);toc
Elapsed time is 1.664062 seconds.
Jan
Jan 2017 年 10 月 20 日
編集済み: Jan 2017 年 10 月 20 日
An improved version of Jos' code:
[nR, nE] = size(p) ;
count = zeros(nR,1) ;
tf = true(nE, nR) ; % Transposed to operate on column
for Ri = 1:nR
q = p(Ri, :); % Vector cut out
for Ei = 1:nE
k = Ei;
if tf(k, Ri)
count(Ri) = count(Ri) + 1 ;
while tf(k, Ri)
tf(k, Ri) = false ;
r = q(k) ;
k = r ;
end
end
end
end
To my surprise using a vector tf = true(nE, 1) is not faster.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeResizing and Reshaping Matrices についてさらに検索

製品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by