Finding outliers in a dataset

10 ビュー (過去 30 日間)
Salma fathi
Salma fathi 2022 年 8 月 2 日
回答済み: Cris LaPierre 2022 年 8 月 2 日
Hello, shown in the image are the plots for the dataset I am having. I am trying to clean out the dataset from outliers so that later on I would use it to train a machine learning model.
but apparently it is considering a lot of important data points as outliers, so is there any other approach I could follow to get rid of the outliers?
the plot on top is the whole dataset and in the bottom is after removing the outliears using the following lines
nonOutliers=rmoutliers(Matrix3, 'mean');
figure(3);tiledlayout(2,1);nexttile;
scatter(Matrix3(:,1),Matrix3(:,2),1);
nexttile;
scatter(nonOutliers(:,1),nonOutliers(:,2),1)
ylim([0 10*10^12])
  1 件のコメント
Monica Roberts
Monica Roberts 2022 年 8 月 2 日
One thing to consider is, what do you consider outliers when you look at the graph? Right now, MATLAB doesn't seem to be considering the X-values when calculating outliers. You may want to consider splitting your data into chunks and passing it into rmoutliers. I'd start at where the data shoots up and group every ~200 values of x, pass those chunks into rmoutliers, and see what happens.
There are also other parameters you can pass into rmoutliers. For instance, maybe "mean" isn't the best method of detecting outliers for this dataset. Have you tried the others? The 'movmean' or 'movmedian' methods, for instance, might do the chunking I've described.

サインインしてコメントする。

回答 (1 件)

Cris LaPierre
Cris LaPierre 2022 年 8 月 2 日
If you process your data in a live script, consider interactively exploring different ways to detect and remove outliers using the Clean Outlier Data live task. See here:

製品


リリース

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by