Deleting outliers by code

Hi again,
I have a measurements matrix as follows:
105.993000000000 1.64178960306505e+17
106.007000000000 3.10346010252124e+16
106.046000000000 2.22784317607289e+17
106.051000000000 1.48978160280980e+17
106.061000000000 2.79186942297259e+17
106.076000000000 2.02039468852741e+17
106.080000000000 5.02562504223962e+17
the first column is the x value, and the second its the y measurement.
I want to delete the rows in which the average of the neighboring y values are much bigger (or smaller) then the local y (in this example, i want to delete the second row).
How can i do it ?
Thank you !!!

回答 (3 件)

Image Analyst
Image Analyst 2014 年 8 月 18 日

0 投票

Try this, by Brett from the Mathworks:
If you want something less sophisticated, try a modified median filter where you identify outliers, for example by thresholding the signal you get from subtracting the median signal from the original signal and taking the absolute value, and then replace only those elements above the threshold with the median value.

7 件のコメント

Ron
Ron 2014 年 8 月 18 日
Thank you, but If i'm using the modified media filter, i need the average to by dynamic according to the local point ..
Image Analyst
Image Analyst 2014 年 8 月 18 日
m=[...
105.993000000000 1.64178960306505e+17
106.007000000000 3.10346010252124e+16
106.046000000000 2.22784317607289e+17
106.051000000000 1.48978160280980e+17
106.061000000000 2.79186942297259e+17
106.076000000000 2.02039468852741e+17
106.080000000000 5.02562504223962e+17]
x = m(:, 1);
y = m(:, 2);
averaged_y = conv(y, [1,1,1]/3, 'same')
% First and last points should be the same
averaged_y = [y(1); averaged_y; y(end)] plot(y, 'bo-');
hold on;
plot(averaged_y, 'rs-');
legend('blue=original', 'red = average');
So, tell us, which of the original points in blue below are considered the outliers? The mean like you suggested is in red.
Ron
Ron 2014 年 8 月 18 日
Thank you !
the tolerance is 10 times the local mean, so in this case the seconds point should be removed
Image Analyst
Image Analyst 2014 年 8 月 18 日
編集済み: Image Analyst 2014 年 8 月 18 日
Ron you say "but...i need the average to by dynamic according to the local point" - well, exactly what do you think your algorithm of taking the average of the neighbors does? The same thing. conv() and medfilt1() both look at neighbors. conv() takes the average, in case you didn't know. Use whichever you want. Please answer my questions about how you're defining outliers in the above plotted data.
Image Analyst
Image Analyst 2014 年 8 月 18 日
The average at location 2 is 1.3 and the original data is 0.3 so it it not 10 times and so should NOT be removed. Please clarify.
Ron
Ron 2014 年 8 月 18 日
This is the full matrix data (on log scale):
i marked the kind of dots which i want to remove, the algorithm that i had in my mind was
if (y(i-1)+y(i-1))/2 is 10 times bigger/smaller then y(i), delete y(i)
Image Analyst
Image Analyst 2014 年 8 月 18 日
編集済み: Image Analyst 2014 年 8 月 18 日
Try this:
outliers = y > (10 * averaged_y) | y < 0.1 * averaged_y
% Remove outliers
y(outliers) = []
averaged_y comes from conv(). By the way, I don't think this (your algorithm) is a very robust algorithm (just think about it and you'll realize why), but might be okay for your specific set of data.

サインインしてコメントする。

Star Strider
Star Strider 2014 年 8 月 18 日

0 投票

It depends on how you define ‘much bigger (or smaller)’, and the number of neighboring elements you want to average over.
To delete the second row is easy enough (calling your matrix ‘X’ here):
X(2,:) = [];

2 件のコメント

Ron
Ron 2014 年 8 月 18 日
This is just a sample from a bigger matrix, i need it to be systematically removed.
Much bigger/smaller means by power of 10 and only the closest neighboring points (y-1 and y+1)
Star Strider
Star Strider 2014 年 8 月 18 日
編集済み: Star Strider 2014 年 8 月 18 日
I implemented a linear interpolation between (y-1) and (y+1), excluding (y), instead of an average of (y-1) and (y+1). Did you mean to include (y)?
The problem is that all of your points violate the ‘power-of-ten’ exclusion criterion.
My contribution:
X = [105.993000000000 1.64178960306505e+17
106.007000000000 3.10346010252124e+16
106.046000000000 2.22784317607289e+17
106.051000000000 1.48978160280980e+17
106.061000000000 2.79186942297259e+17
106.076000000000 2.02039468852741e+17
106.080000000000 5.02562504223962e+17];
for k1 = 2:size(X,1)-1
B(:,k1) = [[1 1]' [X(k1-1,1) X(k1+1,1)]']\[[X(k1-1,2) X(k1+1,2)]'];
E(k1) = [1 X(k1,1)] * B(:,k1); % Expected From Interpolation
D(k1) = E(k1) - X(k1,2); % Difference
end
Ep = E(2:end);
Xp = X(2:end-1,1);
figure(1)
plot(X(:,1), X(:,2), '-xb') % Plot Data
hold on
plot(Xp, Ep, '-+r') % Plot Interpolated Values
hold off

サインインしてコメントする。

Guillaume
Guillaume 2014 年 8 月 18 日
編集済み: Guillaume 2014 年 8 月 18 日

0 投票

averages = (m(1:end-2, 2) + m(3:end, 2)) / 2; %averages of row, row+2
m([1; abs(averages - m(2:end-1, 2)) > tolerance; end], :) = [];
Should do it

5 件のコメント

Ron
Ron 2014 年 8 月 18 日
it's good, but not quite working .. i guess that it should be
averages = (m(1:end-2, 2) + m(3:end,2)) / 2;
but the second line of your code still not working
Guillaume
Guillaume 2014 年 8 月 18 日
編集済み: Guillaume 2014 年 8 月 18 日
Yes, sorry about the typo.
What do you mean by the second line not working?
However, by your definition, the 6th point is more an outlier than the 2nd point. You can see that with
abs(averages - m(2:end-1, 2))
which show you the difference of point 2-6 from the average of their neighbours.
Ron
Ron 2014 年 8 月 18 日
i'm getting:
Error using horzcat
Dimensions of matrices being concatenated are
not consistent.
Guillaume
Guillaume 2014 年 8 月 18 日
Sorry, should have been
m([1; abs(averages - m(2:end-1, 2)) > tolerance; end], :) = [];
I've edited my answer to correct all the typos.
Joseph Cheng
Joseph Cheng 2014 年 8 月 18 日
Another method would be to use the conv.
x=randi(100,1,20);
Nav= [1 0 1]/2;
test = conv(x,Nav,'valid');
then perform the subtraction from the input (here x) from index t2o to the end-1.
difference =x(2:end-1)-test;
then threshold appropriately.

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeOperators and Elementary Operations についてさらに検索

タグ

質問済み:

Ron
2014 年 8 月 18 日

編集済み:

2014 年 8 月 18 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by