Outlier removal from a matrix
24 ビュー (過去 30 日間)
古いコメントを表示
I removed the outliers from my dataset with rmoutliers(A,'mean') command. It should remove the data 3 standard deviations from the mean of each column. But when I print the histogram of each column, there are still some data as far as 6 standard deviations away. What do you suggest? Here is my code:
A = rmoutliers(table_data,'mean');
Zscores = zscore(A); %(A is a 50000*12 matrix)
figure
histogram(Zscores(:,2))
In the histogram, there are still some data as far as 6 standard deviations away.
1 件のコメント
John D'Errico
2022 年 10 月 11 日
help rmoutliers
I had to go to the doc to check your claim that rmoutliers with the 'mean' option does specifically use 3 standard deviations as the cutoff, away from the mean and then it removes the entire row containing that outlier. This is true. But rmoutliers is not a perfect tool, and any such tool can have problems if you dare to push its limits.
x = [ones(1,5),1 + eps,10]
xhat = rmoutliers(x)
xhat == 1
So rmoutliers first removed the 10 as being more than 3 sigma out, but then, since the standard deviation of the first 5 elements is exactly zero, 1+eps is ALSO more than 3 sigma out, and a clear outlier. The point is, if you try hard enough, you can always cause any such adaptive tool to exhibit strange behavior.
But if you want to know what happened, then you need to provide your data. Otherwise, anything is just a wild guess.
Attach it to a comment (not as an answer), in a .mat file.
回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Descriptive Statistics についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!