Removing instantaneous jumps (outliers) from a time series data set

Question

luke 2024 年 3 月 22 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2097681-removing-instantaneous-jumps-outliers-from-a-time-series-data-set

コメント済み: Star Strider 2024 年 3 月 26 日

Hi All,

I have an array of time series data that has instantaneous jumps in it that need removing. The issue I have is that the time series represents cliff failures and therefore I need to establish a code that removes these jumps that are outliers. However, some of the jumps actually represent real cliff failures and are not anomalies. The actual matrix size is 60x262. I have included a snippet of the time series showing a 'jump'. Since I need the code to check across rows as these are associated with seperate transects, I assume this is best done using a 'for' loop however I am not entirely sure how I could do this. In the below example, I would need to remove the 10.9 and replace it with the mean of the values either side however there could be cases where there are 2-3 consecutive outliers that all need replacing. Any help would be greatly appreciated.

41595489732326	2.41031483654907	2.40801209924756	2.40993913128027	2.41582625997938
87743028179327	2.88058793933828	2.88454870497978	2.87492517755818	2.87925902569740
40769380488418	6.54033729047571	10.9192698256242	6.42335491352382	6.53564543320352
9915744275613	13.9347860460628	13.9070741204200	13.9481973397569	13.9297364372147
65790304043271	6.68776078855576	6.69467488849872	6.65418705323087	6.68234121374043

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Star Strider 2024 年 3 月 22 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2097681-removing-instantaneous-jumps-outliers-from-a-time-series-data-set#answer_1429576

MATLAB Online で開く

There are several functions to detect and remove outliers, depending on how you want to define them and deal with them.

Here is an example using the isoutlier function with your posted data —

A = [2.41595489732326 2.41031483654907 2.40801209924756 2.40993913128027 2.41582625997938

2.87743028179327 2.88058793933828 2.88454870497978 2.87492517755818 2.87925902569740

6.40769380488418 6.54033729047571 10.9192698256242 6.42335491352382 6.53564543320352

13.9915744275613 13.9347860460628 13.9070741204200 13.9481973397569 13.9297364372147

6.65790304043271 6.68776078855576 6.69467488849872 6.65418705323087 6.68234121374043];

x = 1:size(A,2);

Lm = isoutlier(A, 'median', 2) % Logical MAtrix

Lm = 5x5 logical array

0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

showOutliers = A.*Lm % Return Detected Outliers

showOutliers = 5x5

0 0 0 0 0 0 0 0 0 0 0 0 10.9193 0 0 0 0 0 0 0 0 0 0 0 0

colsum = sum(showOutliers); % Create Vector By 'sum' Over Columns

Lv = colsum ~= 0; % Logical VEctor

figure

plot(x, A, 'DisplayName','Data')

hold on

plot(x(Lv), colsum(Lv), 'sg', 'DisplayName','Outliers')

hold off

grid

xlabel('x')

ylabel('y')

legend('Location','best')

axis('padded')

The function documentation has links to the other outlier functions within and at the end of the page.

.

11 件のコメント
9 件の古いコメントを表示9 件の古いコメントを非表示

luke 2024 年 3 月 24 日

MATLAB Online で開く

The issue I have when I try this is that the data I am trying to remove is outliers that occur over 1 or 2 consecutive days. When I try run the code using the isoutliers it removes the outliers I need removing but it also removes what would be considered actual cliff failures in my data. As the rows are associated with a transect of a cliff and the columns are the cliff position over 262 days, this method removes some of the failures that I need in the data. I have included an example of an actual failure in the data below. Thank you for the help so far!

4585011982720	10.4637763353790	NaN	NaN	10.4640269851509	10.4584327262222	NaN	10.4644014312166	10.4603796196603
02524689654469	4.02266082065308	NaN	NaN	4.02078013877149	2.20518592114340	NaN	2.20109167771470	2.20532909366356
6288233367343	10.6292283104308	NaN	NaN	10.6566239457597	10.6601557759837	NaN	10.6395872853522	10.6510605896744

Above you can see that there has been a failure in the second row where it jumps from 4.02 to 2.205. I am wondering if there is a for loop that could do this better however I am still very new to MATLAB and wouldn't know how to execute that. Any help with this would be greatly appreciated.

Star Strider 2024 年 3 月 24 日

I do not completely understand your requirements.

There are several options with respect to detecting outliers defined in the isoutlier documentation, including window lengths over which they are detected, and threshold values, as well as different methods of detecting them. If you want to loop through each row individually (since your data appear to be row-major rather than column-major), that is certainly an option worth considering.

You will have to experiment with those options to get the result you want. (Note that isoutlier detects them. There are other functions that interpolate them or remove them, using the same essential logic.)

Data considered to be outliers that are at the ends of the data might be more difficult to detect.

There are also other functions that may be appropriate, such as the Signal Processing Toolbox hampel function, and some Statistics and Machine Learning Toolbox functions. Search the documentation using the ‘outlier’ keyword to find them.

.

luke 2024 年 3 月 25 日

Sorry I have not explained this well, essentially my data represents cliff recession for a specific height on the cliff for a specific transect. For example, the data examples above show all the data recession for all transects for the 8m height location from the base of the cliff. Therefore, each row is associated with a single transect where the recession has been recorded for 262 days (the number of columns). The problem I have is that I am trying to identify cliff failures in the data, which happens when there is a quick jump in the data, however as the cliff has failed the location of the cliff is permenantly changed, which can be seen in the second example above where the cliff moves from 4.02 to 2.205. However, the data has been recorded using LiDAR and thus has picked up anomolies where for some reson the data instantaneously spikes and then returns to the values previous and after, these values I need to remove. However, when I try use the 'isoulier' function it removes most of the anomolies but also some of the cliff failures which i need to identify. I know this is rather long and I very much appreciate your help.

luke 2024 年 3 月 25 日

編集済み: luke 2024 年 3 月 25 日

MATLAB Online で開く

my bad, this is the second time I have used this to ask a question so I am still learning. Below I have extracted a specific row from the matrix which shows the issue, you can see the spikes in the graph which represent outliers but you can also see where the cliff fails at 184 days. I essentially need a code that can go through each row seperately of a 60x262 array and replace the outliers with the previous value without removing or effecting any actual cliff failures that occur.

transect_25 = [7.06082672488186 7.01219231199077 7.10044052550893 7.05856413501238 7.08276782278192 7.08171780444893 7.13944195883233 7.08889967530182 7.07324463573404 7.09231298760920 12.6461994747653 12.1635560774415 7.08492965069543 7.07295006499794 7.09627915640382 7.10107349192131 7.12994953727733 7.12661420266218 7.12661420266218 7.12661420266218 7.05617262563685 7.06059679567114 7.04566565246833 7.06218732905983 7.07054630072494 7.00820198208732 7.06471919436917 7.05963427988496 12.8966296328669 7.05546228727488 7.09974448583935 7.10602063639012 7.05422148838586 7.05970082662912 7.05209565143113 7.05209565143113 7.07534664812658 7.08779177630935 7.07391889704875 7.06983996842995 7.07079316173566 7.05897258943634 7.07346519072165 7.06164618807580 7.06164618807580 7.06701486925527 7.07458563065911 7.05823180926617 7.06207550372241 7.07298964567741 7.06751837700166 7.07190469078925 7.06527550446671 7.07516421409874 7.07516421409874 7.08275265949953 7.07740272730190 7.10250565006205 7.09902367124141 7.05524550102303 7.08336393786298 7.08792387873578 7.05091596720386 7.06260433371295 7.07498428694577 7.07498428694577 7.07498428694577 7.09185707564830 7.09512553734544 7.09512553734544 7.09543693814034 7.10085264644561 7.11534031133025 7.09524796823034 7.09023929946413 7.09411512920930 7.09677732163462 7.08112576776079 7.07066243749597 7.07890039827473 7.06907688688210 12.1576453103725 7.07988259677917 7.07658030017830 7.08738904173987 7.09920213720245 7.09326516028715 7.08479436328462 7.09835801513986 7.11346331710328 7.09171580684489 7.08200702436583 7.06871212320463 7.07090457242737 7.08660494846796 7.07925327276423 7.10140558028156 7.08431366433294 7.07866484063952 7.07422067888069 7.07769123780812 7.10242606540593 7.10827105733077 7.09271098722756 7.07677221451946 7.09140428695505 7.09327066992483 7.10239059469005 7.06912540914180 7.07319947240628 7.07121594039309 7.07240166594115 7.06324149656784 7.06796154952571 7.08567407705872 7.07530683414821 7.08577467638192 7.06683350459576 7.05742739058044 7.05998800101449 7.06955241168589 7.08185617275095 7.07438708524394 7.07434881688500 7.06586666096850 7.05194859085328 7.06932370235013 7.08178560002057 7.07866912494456 7.06490524002649 7.07625744030493 7.06112438612824 7.07080327856758 7.07973364113233 7.07876968653146 7.07514466900249 7.07145515747756 7.06200575768278 7.08152248514656 7.06720205374961 7.06167213659629 7.06372677385038 7.08701029607814 7.06884126329982 7.08056846440078 7.08747263934533 7.08296354531281 7.08296354531281 7.07253152092066 7.07539148119221 7.08598544502445 7.06555919214601 7.06040755593851 7.06319839735486 7.03774525079650 7.07247737616614 7.06342519854799 7.07562860705477 7.06997006286174 7.08615817161598 7.08615817161598 7.06650503401891 7.06840483319523 7.06508403879103 7.07105810449807 7.08039063981352 7.07853322202069 7.08450031161347 7.06322103388653 10.2393855050364 7.08749678020257 7.03623613269079 7.06759874149395 7.07097740711192 7.08840717015496 7.03693478452261 6.99751955829468 7.07953607064244 7.07953607064244 7.07497520519099 6.80146328565236 6.77084948860887 6.77570010932872 6.77570010932872 5.09012126412471 5.11895573892730 5.12392532976123 5.11797826000603 5.11583657173135 5.10833638690966 5.11725687899704 5.10081567220365 5.10937699744930 5.11580973513697 5.11824781491970 5.13850698164360 5.11067453400414 5.11067453400414 5.11067453400414 5.09472444045912 5.09536803826449 5.15408720405293 5.11547395201749 5.13746221416810 5.09760413560132 5.10802205918507 5.11111781541114 5.10598042093422 5.11660076193577 5.11016181233962 5.11016181233962 5.17242193355346 5.15969351738496 5.15969351738496 5.11740065482349 5.09474904624700 5.10753244984819 5.07545155222689 5.12262743630131 5.11427946721230 5.09415188603182 5.12890972602040 5.10010865129911 5.10035172860319 5.10814279873208 5.12830170679973 5.12830170679973 5.12830170679973 5.09889621938938 5.11282529201497 5.11645627190278 5.11645627190278 5.10451257185323 5.11614058004224 5.12346416324344 5.10511006434787 5.10231546121311 5.13996072842562 5.13996072842562 5.13996072842562 5.20242432709806 5.20242432709806 5.09391639366360 5.09391639366360 5.09641760226267 5.11621239065016 5.10788591758176 5.10788591758176 5.10681860460715 5.10588285761392 5.10588285761392 5.10588285761392 5.07745503220912 5.12294020813828 5.14482828732414 5.14482828732414 5.14482828732414 5.14482828732414 5.14482828732414 5.10891224892158 5.13229323194841 5.11680718391220];

days = [1:1:262];

plot(days, transect_25)

xlabel 'Time (Days)'

ylabel 'Recession of cliff'

Star Strider 2024 年 3 月 25 日

編集済み: Star Strider 2024 年 3 月 25 日

MATLAB Online で開く

I do two things here, first, use isoutlier to detect the outliers, and second, use rmoutliers to remove them. (I use them both here simply to demonstrate that process. The rmoutliers function also detects them, and the second output can be used to show them, instead of an additional isoutlier call. The same arguments can be used for both functions, since they are essentially the same function, with different results.)

Try this —

transect_25 = [7.06082672488186 7.01219231199077 7.10044052550893 7.05856413501238 7.08276782278192 7.08171780444893 7.13944195883233 7.08889967530182 7.07324463573404 7.09231298760920 12.6461994747653 12.1635560774415 7.08492965069543 7.07295006499794 7.09627915640382 7.10107349192131 7.12994953727733 7.12661420266218 7.12661420266218 7.12661420266218 7.05617262563685 7.06059679567114 7.04566565246833 7.06218732905983 7.07054630072494 7.00820198208732 7.06471919436917 7.05963427988496 12.8966296328669 7.05546228727488 7.09974448583935 7.10602063639012 7.05422148838586 7.05970082662912 7.05209565143113 7.05209565143113 7.07534664812658 7.08779177630935 7.07391889704875 7.06983996842995 7.07079316173566 7.05897258943634 7.07346519072165 7.06164618807580 7.06164618807580 7.06701486925527 7.07458563065911 7.05823180926617 7.06207550372241 7.07298964567741 7.06751837700166 7.07190469078925 7.06527550446671 7.07516421409874 7.07516421409874 7.08275265949953 7.07740272730190 7.10250565006205 7.09902367124141 7.05524550102303 7.08336393786298 7.08792387873578 7.05091596720386 7.06260433371295 7.07498428694577 7.07498428694577 7.07498428694577 7.09185707564830 7.09512553734544 7.09512553734544 7.09543693814034 7.10085264644561 7.11534031133025 7.09524796823034 7.09023929946413 7.09411512920930 7.09677732163462 7.08112576776079 7.07066243749597 7.07890039827473 7.06907688688210 12.1576453103725 7.07988259677917 7.07658030017830 7.08738904173987 7.09920213720245 7.09326516028715 7.08479436328462 7.09835801513986 7.11346331710328 7.09171580684489 7.08200702436583 7.06871212320463 7.07090457242737 7.08660494846796 7.07925327276423 7.10140558028156 7.08431366433294 7.07866484063952 7.07422067888069 7.07769123780812 7.10242606540593 7.10827105733077 7.09271098722756 7.07677221451946 7.09140428695505 7.09327066992483 7.10239059469005 7.06912540914180 7.07319947240628 7.07121594039309 7.07240166594115 7.06324149656784 7.06796154952571 7.08567407705872 7.07530683414821 7.08577467638192 7.06683350459576 7.05742739058044 7.05998800101449 7.06955241168589 7.08185617275095 7.07438708524394 7.07434881688500 7.06586666096850 7.05194859085328 7.06932370235013 7.08178560002057 7.07866912494456 7.06490524002649 7.07625744030493 7.06112438612824 7.07080327856758 7.07973364113233 7.07876968653146 7.07514466900249 7.07145515747756 7.06200575768278 7.08152248514656 7.06720205374961 7.06167213659629 7.06372677385038 7.08701029607814 7.06884126329982 7.08056846440078 7.08747263934533 7.08296354531281 7.08296354531281 7.07253152092066 7.07539148119221 7.08598544502445 7.06555919214601 7.06040755593851 7.06319839735486 7.03774525079650 7.07247737616614 7.06342519854799 7.07562860705477 7.06997006286174 7.08615817161598 7.08615817161598 7.06650503401891 7.06840483319523 7.06508403879103 7.07105810449807 7.08039063981352 7.07853322202069 7.08450031161347 7.06322103388653 10.2393855050364 7.08749678020257 7.03623613269079 7.06759874149395 7.07097740711192 7.08840717015496 7.03693478452261 6.99751955829468 7.07953607064244 7.07953607064244 7.07497520519099 6.80146328565236 6.77084948860887 6.77570010932872 6.77570010932872 5.09012126412471 5.11895573892730 5.12392532976123 5.11797826000603 5.11583657173135 5.10833638690966 5.11725687899704 5.10081567220365 5.10937699744930 5.11580973513697 5.11824781491970 5.13850698164360 5.11067453400414 5.11067453400414 5.11067453400414 5.09472444045912 5.09536803826449 5.15408720405293 5.11547395201749 5.13746221416810 5.09760413560132 5.10802205918507 5.11111781541114 5.10598042093422 5.11660076193577 5.11016181233962 5.11016181233962 5.17242193355346 5.15969351738496 5.15969351738496 5.11740065482349 5.09474904624700 5.10753244984819 5.07545155222689 5.12262743630131 5.11427946721230 5.09415188603182 5.12890972602040 5.10010865129911 5.10035172860319 5.10814279873208 5.12830170679973 5.12830170679973 5.12830170679973 5.09889621938938 5.11282529201497 5.11645627190278 5.11645627190278 5.10451257185323 5.11614058004224 5.12346416324344 5.10511006434787 5.10231546121311 5.13996072842562 5.13996072842562 5.13996072842562 5.20242432709806 5.20242432709806 5.09391639366360 5.09391639366360 5.09641760226267 5.11621239065016 5.10788591758176 5.10788591758176 5.10681860460715 5.10588285761392 5.10588285761392 5.10588285761392 5.07745503220912 5.12294020813828 5.14482828732414 5.14482828732414 5.14482828732414 5.14482828732414 5.14482828732414 5.10891224892158 5.13229323194841 5.11680718391220];

days = [1:1:262];

Lv = isoutlier(transect_25, 'movmedian', 50, 'ThresholdFactor', 6);

figure

plot(days, transect_25, 'DisplayName','Original Data')

hold on

plot(days(Lv), transect_25(Lv), 'sr', 'DisplayName','Detected Outliers')

hold off

xlabel 'Time (Days)'

ylabel 'Recession of cliff'

legend('Location','best')

[~,CD] = rmoutliers(transect_25, 'movmedian', 50, 'ThresholdFactor', 6); % 'CD' Is A Logical Vector Detecting The Outliers

figure

plot(days, transect_25, 'DisplayName','Original')

hold on

plot(days(~CD), transect_25(~CD), 'DisplayName', 'Outliers Removed')

hold off

xlabel 'Time (Days)'

ylabel 'Recession of cliff'

legend('Location','best')

figure

plot(days(~CD), transect_25(~CD))

xlabel 'Time (Days)'

ylabel 'Recession of cliff'

title('Transect 25 With Outliers Removed')

axis('padded')

You may be able to use the arguments I use here on all your data, however it’s likely that they will require some ‘tweaks’ depending on the data set. To understand the various arguments and how to use them, see the documentation I linked to.

EDIT — Corrected typographical errors.

.

luke 2024 年 3 月 26 日

MATLAB Online で開く

I have gone through used what you suggested and it works great, one last thing is how do I return the row with the replaced values? When I use the code below it returns firstly the row with the outliers remove and then the logical array. I need to obtain the row with the outliers replaced. Sorry I am sure this is easy but I cant get the correct result. A provides the row with the outliers remove and CD provides the logical row.

[A,CD] = rmoutliers(transect_above, 'movmedian', 50, 'ThresholdFactor', 6);

Star Strider 2024 年 3 月 26 日

Thank you! I was hoping it would work on your other data sets, however I couldn’t be certain.

Use the second output, similar to what I used in the plot —

figure

plot(days(~CD), transect_25(~CD))

xlabel 'Time (Days)'

ylabel 'Recession of cliff'

title('Transect 25 With Outliers Removed')

axis('padded')

The ‘CD’ result is a logical vector that works like any other subscript, and has the positions of the outliers as true, so use the negated version (~CD, the ~ is the logical ‘not’ operator) to return the corrected vectors without the outliers.

If your data have each transect in its its own table (called ‘transect_25’ here), the addressing would be:

days = transect_25.days(~CD);

transect = transect_25.transect(~CD);

or equivalently:

transect_25_corrected = transect_25(~CD,:)

days = transect_25_corrected.days;

transect = transect_25_corrected.transect;

The second approach would also work if ‘transect_25’ is an (Nx2) array. instead of a table.:

transect_25_corrected = transect_25(~CD,:)

If ‘transect_25’ is instead a (2xN) array, the order of the subscripts is reversed:

transect_25_corrected = transect_25(:,~CD);

That should work.

.

サインインしてコメントする。

Removing instantaneous jumps (outliers) from a time series data set

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

11 件のコメント
9 件の古いコメントを表示9 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

Community Treasure Hunt

Removing instantaneous jumps (outliers) from a time series data set

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

11 件のコメント 9 件の古いコメントを表示9 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

11 件のコメント
9 件の古いコメントを表示9 件の古いコメントを非表示