# How can I further improve the performance of this function?

Emiliano Rosso 2016 年 6 月 1 日
This function is an adaptation for my needs of nanmean and nanstd taken by NanSuite Jan Gläscher.
function [inputstemp4]=OPTIM3_END_GeneticPattern(inputstemp)
%#codegen
global inputs;
[m,n]=size(inputs);
nset1=1:100;
nimp1=1:n;
STDinputs=zeros(100,n);
AVinputs=zeros(100,n);
inputstempcopy=inputstemp;
dim = 2;
nans = isnan(inputstemp);
inputstemp(isnan(inputstemp)) = 0;
count = size(inputstemp,dim) - sum(nans,dim);
i = find(count==0);
count(i) = ones(size(i));
AVinputs(nset1,nimp1) = sum(inputstemp,dim)./count;
AVinputs(i) = i + NaN;
avg= sum(inputstemp,dim)./count;
avg(i)= i + NaN;
inputstempcopy = inputstempcopy - repmat(avg,[1,33,1]);
inputstempcopy(isnan(inputstempcopy)) = 0;
STDinputs(nset1,nimp1) = sqrt(sum(inputstempcopy.*inputstempcopy,dim)./max(count-1,1));
STDinputs(i) = i + NaN;
%----------
inputstemp4=AVinputs-min(STDinputs.*AVinputs,AVinputs);
end
Encoding mex worsens performance, coder.extrinsic (nanmean, nanstd) -the original Matlab functions - lengthens the time. Encoding mex my substitute nanmeam / nanstd code:
inputstemp=reshape(permute(inputstemp ,[2,1,3]),100.*33,n);
for nset=1:100
for nimp=1:n
countpart=0;
for npart=1:33
if isnan(inputstemp(npart+(33.*(nset-1)),nimp))==0
countpart=countpart+1;
end
end
if countpart~=0
%avcount=coder.nullcopy(zeros(1,countpart));
avcount=zeros(1,countpart);
countpart=0;
for npart=1:33
if isnan(inputstemp(npart+(33.*(nset-1)),nimp))==0
countpart=countpart+1;
avcount(1,countpart)=inputstemp(npart+(33.*(nset-1)),nimp);
end
end
else
avcount=0;
end
if countpart~=0
[qq,ww]=size(avcount);
mymean=0;
mysum=0;
for rr=1:ww
mymean=mymean+avcount(1,rr);
end
mymean=mymean./ww;
for rr=1:ww
mysum=mysum+(avcount(1,rr)-mymean).^2;
end
AVinputs(nset,nimp)=mymean;
STDinputs(nset,nimp)=sqrt(mysum./(ww-1));
else
AVinputs(nset,nimp)=0;
STDinputs(nset,nimp)=0;
end
end
end
inputstemp4=AVinputs-min(STDinputs.*AVinputs,AVinputs);
...It improves performance compared to the original code but it does not beat the performance of nansuite I adapted. I tried using single precision but does not work.

