Repeat for-loop from the beginning after recalculating values

Question

0 投票

P31.mat

Ok, I am having a hard time figuring this out. I have a 6 X 6 matrix of values, in this case, pressure measurements in psi (P31.mat attached). I'm trying to apply rejection criteria for the outliers in this set of pressures. Each column of P31 is from a separate pressure transducer (6 measurements from each). I calculate the mean, standard deviation, and the number of elements (not including NaN) in each column using the first loop. In the second loop I calculate the deviation of each measurement (delta_31) from the mean (of the column).

In the third loop I use if-statements to determine a value for tau, according to the number of elements in each column. Then I multiply that tau value by the standard deviation of that column and compare each deviation value (delta_31). If delta_31(i,j) is greater than the calculated value for tau*stddev, then replace that delta value with a NaN, as well as the corresponding (i,j) value in P31. I then recalculate the stdev, mean, and number of elements. The problem is I'm trying to repeat the loop again to determine if there is another outlier in that column and I can't seem to get it. I thought that by stating i = 1, the loop should begin again, but it doesn't. Can anyone help? I apologize if this is confusing.

for k = 1:size(P31,2)
  x_31(k) = mean(P31(:,k),'omitnan');    
  SDP_31(k) = std(P31(:,k),'omitnan');
  n_31(k) = numel(P31(:,k)) - sum(isnan(P31(:,k)));
  nu_31(k) = n_31(k) - 1;
end
for j = 3:size(P31,2)
  for i = 1:size(P31,1)
    delta_31(i,j) = abs(P31(i,j) - x_31(j));
  end
  for j = 3:size(P31,2)
    for i = 1:size(P31,1)
      % Conditional if-statements to determine the values for tau
      if n_31(j) == 3
        tau_31 = 1.150;
      elseif n_31(j) == 4
        tau_31 = 1.393;
      elseif n_31(j) == 5
        tau_31 = 1.572;
      elseif n_31(j) ==6
        tau_31 = 1.656;
      elseif n_31(j) == 7
        tau_31 = 1.711;
      end
      % Calculate tau*S to compare with the delta's
      tauS_31(j) = tau_31*SDP_31(j);
      % if-statement to replace outliers with NaN and recalculate
      % everything. This actually works, but only for eliminating one
      % outlier from the column. 
      if delta_31(i,j) > tauS_31(j)
        delta_31(i,j) = NaN;
        P31(i,j) = NaN;
        x_31(j) = mean(P31(:,j),'omitnan');
        SDP_31(j) = std(P31(:,j),'omitnan');
        n_31(j) = numel(P31(:,j)) - sum(isnan(P31(:,j)));
        for k = 1:size(P31,1)
          delta_31(k,j) = abs(P31(k,j) - x_31(j));
        end
        i = 1;  % this is supposed to start the i-loop again, but doesn't
      else
        i = i + 1;
      end
    end
  end

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

dpb 2018 年 9 月 6 日

"I thought that by stating i = 1, the loop should begin again..."

No, that is pretty-much a standard implementation in (most) all (modern) programming languages--the loop limit is stored before the loop begins and in Matlab any change to the loop index variable is overwritten by for for the next iteration from a local copy.

dpb 2018 年 9 月 6 日

You could use one of those routines simply by saving the original index order to rearrange the order back to original when done.

The Thompson tau is one-at-a-time so it's sufficient on each iteration to only test the largest absolute difference; the ordering is some algorithm's way to look at the two end points each iteration rather than find max(abs(difference)) each pass.

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

dpb 2018 年 9 月 6 日

編集済み: dpb 2018 年 9 月 6 日

MATLAB Online で開く

0 投票

Directly addressing the Q?, here's a basic function to do the removal--

function x=thompsontau(x)
% Return array x w/ outliers replaced w/ NaN from input x
% Each column in x is assumed to be variable, rows are observations
%
% dpbozarth -- 5Sep2018
    [r,c]=size(x);            % row, col dimensions
    cols=1:c;                 % later lookup for columns if outliers exist
    mn=mean(x,'omitnan');     % mean by column
    sd=std(x,'omitnan');      % std deviation
    n=sum(isfinite(x));       % finite observations
    [mx,ix]=max(abs(x-mn));   % value, location (row) of max deviation
    isOut=mx>fntau(n).*sd;    % column locations are outliers if T
    if any(isOut)
      indxOut=sub2ind([r c],ix(isOut),cols(isOut));  % locations in x of outliers
      x(indxOut)=nan;         % mark as outlier
    end
  end
    function tau = fntau(n)
    % returns critical Thompson tau for n observations
      tcrit=tinv(1-0.05/2,n-2);    % critical t value
      tau=tcrit./sqrt(n)./sqrt(n-2+tcrit.*tcrit).*(n-1);
    end

This vectorizes your loop approach for the first case; your mission, should you choose to accept, is to now encapsulate this to do the subsequent tests until any(isOut) is false.

This could be done by calling the above in a loop or calling it recursively inside this function; will leave that as "exercise for the student" (which seems only appropriate for a t-test based algorithm :) )

NB: This operates on the full input array; if there's reason to not operate on some columns of an array, simply pass the subset. Or, as in this case, the first two columns will never have an outlier show up so it doesn't matter much, but for some datasets obviously that might not be so.

But, don't build such "magic constants" into functions that makes them unsuitable for other use as a general practice.

PS: HINT The recursive solution is by far the neater here and only takes inserting three additional code lines, all of which are consecutive, and two of which are Matlab branching keywords. :) (And, actually, the latter two aren't absolutely mandatory but I think it makes the code more legible to include them).

NB2: fntau uses tinv which requires Statistics TB. Lacking that, will have to revert to using a lookup table or finding another replacement for the t-value.

12 件のコメント
10 件の古いコメントを表示 10 件の古いコメントを非表示

dpb 2018 年 9 月 7 日

MATLAB Online で開く

That's cheating! :) I'll attach the final version; I also incorporated an optional parameter to allow one to change the alpha parameter for the critical t values...

But, the critical implementation is

...
if any(isOut)
  indxOut=sub2ind([r c],ix(isOut),cols(isOut));  % locations in x of outliers
  x(indxOut)=nan;         % mark as outlier
  x=thompsontau(x);       % recursively check for any more
end

is the minimal change. While not strictly needed, I thought incorporating the else clause made it a little more human-legible as to intent...

...
if any(isOut)
  indxOut=sub2ind([r c],ix(isOut),cols(isOut));  % locations in x of outliers
  x(indxOut)=nan;         % mark as outlier
  x=thompsontau(x);       % recursively check for any more
else
  return
end
function x=thompsontau(x,varargin)
% Return array x w/ outliers replaced w/ NaN from input x
% Each column in x is assumed to be variable, rows are observations
% Default for the critical t-values is alpha=0.05 two-sided.  Supply the
% total alpha as an optional second parameter if desired to expand/loosen
% tolerance
%
%  x = thompsontau(x,0.10)
%     Specify a 10% two-sided interval
% dpbozarth -- 5Sep2018
%           -- 7Sep2018 Incorporated optional alpha level
    defaultalpha=0.05;
    fnValidAlpha=@(x) isnumeric(x) && isscalar(x) && iswithin(x,0,1);
    p=inputParser;
    addRequired(p,'x')
    addOptional(p,'alpha',defaultalpha,fnValidAlpha)
    parse(p,x,varargin{:})
    alpha=p.Results.alpha;
    [r,c]=size(x);            % row, col dimensions
    cols=1:c;                 % later lookup for columns if outliers exist
    mn=mean(x,'omitnan');     % mean by column
    sd=std(x,'omitnan');      % std deviation
    n=sum(isfinite(x));       % finite observations
    [mx,ix]=max(abs(x-mn));   % value, location (row) of max deviation
    isOut=mx>fntau(n,alpha).*sd; % column locations are outliers if T
    if any(isOut)
      indxOut=sub2ind([r c],ix(isOut),cols(isOut));  % locations in x of outliers
      x(indxOut)=nan;         % mark as outlier
      x=thompsontau(x,alpha);       % check for more
    %else
    %  return
    end
  end
function tau = fntau(n,alpha)
% returns critical Thompson tau for n observations
    tcrit=tinv(1-alpha/2,n-2);    % critical t value
    tau=tcrit./sqrt(n)./sqrt(n-2+tcrit.*tcrit).*(n-1);
  end

PATRICK WAYNE 2018 年 9 月 7 日

Seriously, it works for all of the data sets. I've gone through each one step by step using the debugging feature to make sure it is doing what it is supposed to. All of them come out exactly like they should. Some run through 7 times, some only 3, etc. Remember that it resets the value of i if it detects an outlier. Also, when I run for multiple data sets, i also resets back to 1 before the end of the j-th loop. It is most definitely working.

dpb 2018 年 9 月 8 日

MATLAB Online で開く

Ah! My old eyes had missed that, indeed.

I still think just

if any(isOut)
  ...
  x=thompsontau(x);   % recursively check for any more
end

is much simpler.

サインインしてコメントする。

Answer 2

dpb 2018 年 9 月 6 日

編集済み: dpb 2018 年 9 月 6 日

MATLAB Online で開く

0 投票

The first loop can be eliminated entirely--use the Matlab array syntax:

x_31=mean(P31,'omitnan');    
SDP_31=std(P31,'omitnan');
n_31=sum(isfinite(P31));
nu_31=n_31-1;

I edited your code above to clean up the indenting...looks to me there's a missing end at some point; I didn't try to figure out where that might belong; the trivial assumption that may or may not be right is it's the last one for the big for...need to check on that.

Moving right along...

for j = 3:size(P31,2)
  for i = 1:size(P31,1)
    delta_31(i,j) = abs(P31(i,j) - x_31(j));
  end
  for j = 3:size(P31,2)
    for i = 1:size(P31,1)
      ...

Oh. You've got nested loops on j...perhaps the missing end belongs here before the second j loop?

Like

for j = 3:size(P31,2)
  for i = 1:size(P31,1)
    delta_31(i,j) = abs(P31(i,j) - x_31(j));
  end
end
for j = 3:size(P31,2)
  for i = 1:size(P31,1)
    ...

maybe? If that's so then again use vector syntax--

delta_31(:,3:end)=abs(P31(:,3:end)-x_31);

Now you're down to the removal criteria, but what's the deal with only using the 3:end sensors instead of all six? I don't see what that's all about??? If you have outliers, seems like all sensors would be wanted to be checked.

I'm running out of time/energy here this evening but one last little detail to make code a little easier --

      if n_31(j) == 3
        tau_31 = 1.150;
      elseif n_31(j) == 4
        tau_31 = 1.393;
      elseif n_31(j) == 5
        tau_31 = 1.572;
      elseif n_31(j) ==6
        tau_31 = 1.656;
      elseif n_31(j) == 7
        tau_31 = 1.711;
      end
% put this at very beginning of your script/function...
tau=[1.150, 1.393, 1.572, 1.656, 1.711];   % lookup table values

Then to use, simply write

tau_31(j)=tau(n_31(j)-2);

I'm sure most if not all of the remainder can also be vectorized with, probably, one loop on the outside to do the multiple comparisons.

Take a look at the way the preceding loops were reduced and give the rest a shot; I'll try to look in again in the morning.

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

dpb 2018 年 9 月 6 日

MATLAB Online で開く

For tau, go back to definition...

tcrit=@(n) tinv(1-0.05/2,n-2);       % t distribution t(alpha/2,n-2)
fntau=@(n) tcrit(n)./sqrt(n)./sqrt(n-2+tcrit(n).*tcrit(n)).*(n-1);

Then when you need tau, just write

tau=fntau(n);

This has been vectorized so can pass a vector of N

サインインしてコメントする。

Answer 3

Steven Lord 2018 年 9 月 7 日

0 投票

I'm not familiar with that particular method. Is it close enough to one of the methods supported by the isoutlier function that you can use that function?

If it is not but is used often in a particular application area to detect outliers in data associated with that area, consider filing an enhancement request through Technical Support (using the Contact Us link in the upper-right corner of this page) describing that application area, giving more information and/or a reference for that algorithm, and asking for it to be added to isoutlier.

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

dpb 2018 年 9 月 8 日

編集済み: dpb 2018 年 9 月 11 日

This goes back into my "wayback" machine, Steven. I had to go dig through the "old black notebook of stuff" I collected over working career before there was such a thing as online way to go find things and we saved paper copies of papers! :)

Thompson, W.R., 1935, On a criterion for the rejection of observations and the distribution of the ratio of the deviation to the sample standard deviation, Ann. Math. Stat., 6:214-219.

Now I find the reference dissertation I found through Dr. Clark, utmost gentleman and mentor, who was staff Senior Mathematician at B&W NPGD back in the late 60s/early 70s is available via google search!!!

Stastical Outlier Methods, Tetreault Dissertation CoverPage and

DissertationDownloadLink

This is all quite dated by now, of course, and I've made no attempt to keep up with modern literature, but the basis for the Thompson test.

And, I forget now just what the conclusions were altho I do know and should have pointed out that repetitive application blindly is fraught with difficulty despite the cuteness of being able to write the algorithm. One issue Grubbs pointed out is that the probability of an arbitrary point being an outlier isn't the same as the probability that the largest observation (say) is an outlier with the respect to the remaining observations, the former of which is the Thompson test, not the latter.

ADDENDUM

Ain't science wunnerful!!! :)

Thompson(1935) full text pdf document of the original paper. I do not seem to have a record of the communications after.

The subsequent modification as used above and that is fairly widely spread on various university sites I'm not positive about the origin, my quick algebra on (4) p36 didn't quite duplicate the above although I won't claim I didn't blunder.

ADDENDUM 2

I didn't have a copy but I did find a scribbled note where I jotted down the Grubbs paper reference and managed to now find it online, too... Grubbs(1950).

While I think in general some additional tools might be incorporated in ML, it's not clear to me what is currently considered "state of the art" but I doubt Thompson is the last word on the subject (altho I wouldn't say it shouldn't be an available option).

I generally when trying to catch up on something haven't looked at in quite some time see what NIST currently says; I see they have the MAD (interesting historical use of that acronym exists :) ), as well as Grubbs (one outlier) and and extended Grubbs for multiple as well as another ESD for unknown number with "just" an expected upper limit.

PATRICK WAYNE 2018 年 9 月 10 日

編集済み: PATRICK WAYNE 2018 年 9 月 10 日

I used a textbook from years ago to reference this method: "Introduction to Engineering Experimentation", by Wheeler and Ganji (3rd Ed.), chapter 6. The other Statistics textbooks I have do not cover it at all.

dpb 2018 年 9 月 11 日

I never ran across that text; the similar subject-matter I used was Bevington, Data Reduction and Error Analysis for the Physical Sciences, but it didn't touch on the subject. I see there is a revised edition now, but it seems to be virtually identical in content with just some references to there now being such a thing as a personal computer and available software.

サインインしてコメントする。

Repeat for-loop from the beginning after recalculating values

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

採用された回答

12 件のコメント
10 件の古いコメントを表示 10 件の古いコメントを非表示

その他の回答 (2 件)

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

カテゴリ

製品

リリース

タグ

Community Treasure Hunt

Repeat for-loop from the beginning after recalculating values

3 件のコメント 1 件の古いコメントを表示 1 件の古いコメントを非表示

採用された回答

12 件のコメント 10 件の古いコメントを表示 10 件の古いコメントを非表示

その他の回答 (2 件)

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

3 件のコメント 1 件の古いコメントを表示 1 件の古いコメントを非表示

カテゴリ

製品

リリース

タグ

参考

Community Treasure Hunt

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

12 件のコメント
10 件の古いコメントを表示 10 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示