I forgot to say this. I am trying to use the Thompson Tau technique here. I know there are files out there that can do this, but they all rearrange the vectors in either descending or ascending order. I cannot do that here. Each row of P31 is from a single experiment, so I need all of the data in that row. Rearranging any of the vectors will throw off the result.
Repeat for-loop from the beginning after recalculating values
18 ビュー (過去 30 日間)
古いコメントを表示
Ok, I am having a hard time figuring this out. I have a 6 X 6 matrix of values, in this case, pressure measurements in psi (P31.mat attached). I'm trying to apply rejection criteria for the outliers in this set of pressures. Each column of P31 is from a separate pressure transducer (6 measurements from each). I calculate the mean, standard deviation, and the number of elements (not including NaN) in each column using the first loop. In the second loop I calculate the deviation of each measurement (delta_31) from the mean (of the column).
In the third loop I use if-statements to determine a value for tau, according to the number of elements in each column. Then I multiply that tau value by the standard deviation of that column and compare each deviation value (delta_31). If delta_31(i,j) is greater than the calculated value for tau*stddev, then replace that delta value with a NaN, as well as the corresponding (i,j) value in P31. I then recalculate the stdev, mean, and number of elements. The problem is I'm trying to repeat the loop again to determine if there is another outlier in that column and I can't seem to get it. I thought that by stating i = 1, the loop should begin again, but it doesn't. Can anyone help? I apologize if this is confusing.
for k = 1:size(P31,2)
x_31(k) = mean(P31(:,k),'omitnan');
SDP_31(k) = std(P31(:,k),'omitnan');
n_31(k) = numel(P31(:,k)) - sum(isnan(P31(:,k)));
nu_31(k) = n_31(k) - 1;
end
for j = 3:size(P31,2)
for i = 1:size(P31,1)
delta_31(i,j) = abs(P31(i,j) - x_31(j));
end
for j = 3:size(P31,2)
for i = 1:size(P31,1)
% Conditional if-statements to determine the values for tau
if n_31(j) == 3
tau_31 = 1.150;
elseif n_31(j) == 4
tau_31 = 1.393;
elseif n_31(j) == 5
tau_31 = 1.572;
elseif n_31(j) ==6
tau_31 = 1.656;
elseif n_31(j) == 7
tau_31 = 1.711;
end
% Calculate tau*S to compare with the delta's
tauS_31(j) = tau_31*SDP_31(j);
% if-statement to replace outliers with NaN and recalculate
% everything. This actually works, but only for eliminating one
% outlier from the column.
if delta_31(i,j) > tauS_31(j)
delta_31(i,j) = NaN;
P31(i,j) = NaN;
x_31(j) = mean(P31(:,j),'omitnan');
SDP_31(j) = std(P31(:,j),'omitnan');
n_31(j) = numel(P31(:,j)) - sum(isnan(P31(:,j)));
for k = 1:size(P31,1)
delta_31(k,j) = abs(P31(k,j) - x_31(j));
end
i = 1; % this is supposed to start the i-loop again, but doesn't
else
i = i + 1;
end
end
end
3 件のコメント
dpb
2018 年 9 月 6 日
"I thought that by stating i = 1, the loop should begin again..."
No, that is pretty-much a standard implementation in (most) all (modern) programming languages--the loop limit is stored before the loop begins and in Matlab any change to the loop index variable is overwritten by for for the next iteration from a local copy.
dpb
2018 年 9 月 6 日
You could use one of those routines simply by saving the original index order to rearrange the order back to original when done.
The Thompson tau is one-at-a-time so it's sufficient on each iteration to only test the largest absolute difference; the ordering is some algorithm's way to look at the two end points each iteration rather than find max(abs(difference)) each pass.
採用された回答
dpb
2018 年 9 月 6 日
編集済み: dpb
2018 年 9 月 6 日
Directly addressing the Q?, here's a basic function to do the removal--
function x=thompsontau(x)
% Return array x w/ outliers replaced w/ NaN from input x
% Each column in x is assumed to be variable, rows are observations
%
% dpbozarth -- 5Sep2018
[r,c]=size(x); % row, col dimensions
cols=1:c; % later lookup for columns if outliers exist
mn=mean(x,'omitnan'); % mean by column
sd=std(x,'omitnan'); % std deviation
n=sum(isfinite(x)); % finite observations
[mx,ix]=max(abs(x-mn)); % value, location (row) of max deviation
isOut=mx>fntau(n).*sd; % column locations are outliers if T
if any(isOut)
indxOut=sub2ind([r c],ix(isOut),cols(isOut)); % locations in x of outliers
x(indxOut)=nan; % mark as outlier
end
end
function tau = fntau(n)
% returns critical Thompson tau for n observations
tcrit=tinv(1-0.05/2,n-2); % critical t value
tau=tcrit./sqrt(n)./sqrt(n-2+tcrit.*tcrit).*(n-1);
end
This vectorizes your loop approach for the first case; your mission, should you choose to accept, is to now encapsulate this to do the subsequent tests until any(isOut) is false.
This could be done by calling the above in a loop or calling it recursively inside this function; will leave that as "exercise for the student" (which seems only appropriate for a t-test based algorithm :) )
NB: This operates on the full input array; if there's reason to not operate on some columns of an array, simply pass the subset. Or, as in this case, the first two columns will never have an outlier show up so it doesn't matter much, but for some datasets obviously that might not be so.
But, don't build such "magic constants" into functions that makes them unsuitable for other use as a general practice.
PS: HINT The recursive solution is by far the neater here and only takes inserting three additional code lines, all of which are consecutive, and two of which are Matlab branching keywords. :) (And, actually, the latter two aren't absolutely mandatory but I think it makes the code more legible to include them).
NB2: fntau uses tinv which requires Statistics TB. Lacking that, will have to revert to using a lookup table or finding another replacement for the t-value.
12 件のコメント
dpb
2018 年 9 月 8 日
Ah! My old eyes had missed that, indeed.
I still think just
if any(isOut)
...
x=thompsontau(x); % recursively check for any more
end
is much simpler.
その他の回答 (2 件)
dpb
2018 年 9 月 6 日
編集済み: dpb
2018 年 9 月 6 日
The first loop can be eliminated entirely--use the Matlab array syntax:
x_31=mean(P31,'omitnan');
SDP_31=std(P31,'omitnan');
n_31=sum(isfinite(P31));
nu_31=n_31-1;
I edited your code above to clean up the indenting...looks to me there's a missing end at some point; I didn't try to figure out where that might belong; the trivial assumption that may or may not be right is it's the last one for the big for...need to check on that.
Moving right along...
for j = 3:size(P31,2)
for i = 1:size(P31,1)
delta_31(i,j) = abs(P31(i,j) - x_31(j));
end
for j = 3:size(P31,2)
for i = 1:size(P31,1)
...
Oh. You've got nested loops on j...perhaps the missing end belongs here before the second j loop?
Like
for j = 3:size(P31,2)
for i = 1:size(P31,1)
delta_31(i,j) = abs(P31(i,j) - x_31(j));
end
end
for j = 3:size(P31,2)
for i = 1:size(P31,1)
...
maybe? If that's so then again use vector syntax--
delta_31(:,3:end)=abs(P31(:,3:end)-x_31);
Now you're down to the removal criteria, but what's the deal with only using the 3:end sensors instead of all six? I don't see what that's all about??? If you have outliers, seems like all sensors would be wanted to be checked.
I'm running out of time/energy here this evening but one last little detail to make code a little easier --
if n_31(j) == 3
tau_31 = 1.150;
elseif n_31(j) == 4
tau_31 = 1.393;
elseif n_31(j) == 5
tau_31 = 1.572;
elseif n_31(j) ==6
tau_31 = 1.656;
elseif n_31(j) == 7
tau_31 = 1.711;
end
% put this at very beginning of your script/function...
tau=[1.150, 1.393, 1.572, 1.656, 1.711]; % lookup table values
Then to use, simply write
tau_31(j)=tau(n_31(j)-2);
I'm sure most if not all of the remainder can also be vectorized with, probably, one loop on the outside to do the multiple comparisons.
Take a look at the way the preceding loops were reduced and give the rest a shot; I'll try to look in again in the morning.
1 件のコメント
dpb
2018 年 9 月 6 日
For tau, go back to definition...
tcrit=@(n) tinv(1-0.05/2,n-2); % t distribution t(alpha/2,n-2)
fntau=@(n) tcrit(n)./sqrt(n)./sqrt(n-2+tcrit(n).*tcrit(n)).*(n-1);
Then when you need tau, just write
tau=fntau(n);
This has been vectorized so can pass a vector of N
Steven Lord
2018 年 9 月 7 日
I'm not familiar with that particular method. Is it close enough to one of the methods supported by the isoutlier function that you can use that function?
If it is not but is used often in a particular application area to detect outliers in data associated with that area, consider filing an enhancement request through Technical Support (using the Contact Us link in the upper-right corner of this page) describing that application area, giving more information and/or a reference for that algorithm, and asking for it to be added to isoutlier.
3 件のコメント
dpb
2018 年 9 月 11 日
I never ran across that text; the similar subject-matter I used was Bevington, Data Reduction and Error Analysis for the Physical Sciences, but it didn't touch on the subject. I see there is a revised edition now, but it seems to be virtually identical in content with just some references to there now being such a thing as a personal computer and available software.
参考
カテゴリ
Help Center および File Exchange で Interpolation of 2-D Selections in 3-D Grids についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!