現在この質問をフォロー中です
- フォローしているコンテンツ フィードに更新が表示されます。
- コミュニケーション基本設定に応じて電子メールを受け取ることができます。
How to remove outliers and smooth the complex signals?
30 ビュー (過去 30 日間)
古いコメントを表示
Hi there,
I am working on a complex data set-- a 300-by-1000 matrix which each element is a complex number and each column of this matrix is considered as a single data stream.
I'd like to remove outliers and smooth the signal before any further invistigation. The Hample or rmoutliers filters are only work on real data. Any suggestions for me?
Does it make any sense to apply these filters on real and imag parts of a signal, say x, seperately and consider the new real(x)+j*imag(x) as the filtered data?
Thanks in advance!
採用された回答
Star Strider
2021 年 8 月 27 日
‘Does it make any sense to apply these filters on real and imag parts of a signal, say x, seperately and consider the new real(x)+j*imag(x) as the filtered data?’
The easiest way to determine that is to do that experiment and see what the resullt is.
Z = complex(randn(12,1), randn(12,1))
Query = [isoutlier(real(Z)) isoutlier(imag(Z))]
Zro = rmoutliers([real(Z) imag(Z)])
So the result is valid if either the real or imaginary parts of ‘Z’ (here) is an outlier. The entire row sill be removed, as expected. The result can then be reconstituted using the complex funciton, as I did originally to create it here.
.
21 件のコメント
John D'Errico
2021 年 8 月 27 日
I'm not sure removing an outlier does any good, because then the array will no longer be rectangular, and MALTAB will not allow that.
Susan
2021 年 8 月 27 日
@Star Strider Underestood. All I want to do is to preprocess a noisy data by filtering out the outliers and removing noise by applying a moving average filter. However, because of the complexity of data, I am not sure what would be the best way.
Star Strider
2021 年 8 月 27 日
If the entire row with the outlier is removed, there should be no problem with the integrity of the matrix.
You will need to determine if filling the outliers — rather than removing them — is the best way to go, since filling them by interpolating (using whatever method you choose), creates new data. Simply removing them — and the rows that contain them — avoids that problem.
Everything depends on what you want to do.
Since each column is a separate data stream, one way of avoiding the problem of matrix integrity is to use the mat2cell function so that each column is a separate cell array:
Z(:,1) = complex(randn(12,1), randn(12,1));
Z(:,2) = complex(randn(12,1), randn(12,1));
Z(:,3) = complex(randn(12,1), randn(12,1))
Z =
-1.2813 - 0.1998i 0.5323 - 1.8208i -0.5851 - 0.2996i
0.1563 + 0.9701i 0.1833 - 1.2399i -2.2381 - 0.4084i
-0.6694 + 1.1518i 0.9065 + 0.5875i -0.3811 + 0.4133i
-1.1481 + 2.6963i 0.4605 + 2.4708i 0.3207 + 0.3442i
-1.8806 + 1.8707i -0.4762 + 0.4955i -0.4521 - 1.3997i
0.3389 - 0.6344i 0.0169 - 1.7479i -0.5922 - 0.7799i
-0.2126 + 0.5346i -2.5071 + 0.5240i -0.4455 + 1.9598i
1.2624 - 0.5898i -0.6646 - 0.0799i 0.2233 + 0.4899i
-1.1608 + 1.0328i -1.8290 - 0.6129i 0.8086 + 1.4046i
0.0161 + 0.5505i -1.0206 + 0.6658i 2.2073 - 0.2543i
0.0471 + 0.3463i -1.0308 + 1.0622i 0.2046 - 1.0288i
0.2834 + 1.0997i 1.0445 - 0.7435i -0.0711 - 0.3890i
Zc = mat2cell(Z, size(Z,1), ones(1,size(Z,2)))
Zc = 1×3 cell array
{12×1 double} {12×1 double} {12×1 double}
Query = isoutlier([real([Zc{:}]) imag([Zc{:}])])
Query = 12×6 logical array
0 0 0 0 0 0
0 0 1 0 0 0
0 0 0 0 0 0
0 0 0 1 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 1 0 0 0
Zc = cellfun(@(x)rmoutliers([real(x) imag(x)]), Zc, 'Unif',0)
Zc = 1×3 cell array
{11×2 double} {12×2 double} {10×2 double}
Ans it works here as it did originally, for both the real and imaginary components of the matrix.
One caveat is that each element (column) of ‘Zc’ must be processed separately, since they are now no longer equal in length. If they have associated time vectors, that will need to be considered and adjusted using the isoutlier function for each column so that appropriate times are also deleted.
Again:
Everything depends on what you want to do.
.
Susan
2021 年 8 月 27 日
編集済み: Susan
2021 年 8 月 27 日
@Star Strider As a follow up question, by looking at the plot of real/imag or abs/phase of a data stream can we say wheher or not there is any useful information on them or not? I'm asking that to see if we can simply ignore some data that are not very useful. For example, by plotting the abs and phase of the data streams that I have I get these figs that show there is pattern in the data streams and I don't know how to interpret the fig that I got for the phase. How can I figure out/extract the patern on these data?
Susan
2021 年 8 月 27 日
編集済み: Susan
2021 年 9 月 3 日
@Star Strider. Sorry for pinging you again. Still I don't have any answer for this question. Could you please let me know what you think?
btw, since a logical 1 in the output of isoutlier() indicates the location of an outlier Zc shouldn't be
Zc = 1×3 cell array
{12×2 double} {9×2 double} {12×2 double}
Star Strider
2021 年 8 月 27 日
Those are your data, so you will need to decide what is useful and what is not.
I would plot the magnitude of the data (using the abs function) and the phase of the data (using the angle function) and see what makes the most sense.
There are several ways to filter the data, depending on whether the noise is band-limited (in which situation a frequency-selective filter would be most efficient and effective) or broadband noise (in which situation the Savitzky-Golay filter would be most effective).
Since I have no idea what you ard doing, I cannot determine what data are ‘good’ and what data are not.
One way I usually determine this is to take the mean of the data and the standard error of the mean, and use that and the 95% coinfidence limits (assuming a known distribution) to decide on whether to use specific data. It could very well be that all the data are ‘good’, and simply represent time-associated differences in the system you are measuring. That informationo in itself could be valuable, since it would provide insight into the time-varying nature of the system, as well.
Again, I have no idea what you are doing, so some of this is simply speculation on my part.
.
Susan
2021 年 8 月 27 日
編集済み: Susan
2021 年 8 月 27 日
Thank you so much again for your informative response. I truly appreciate it.
To make sure I got you correctly, you take the mean of data using "mean" command and the standard error of the mean by
stderror= std( data ) / sqrt( length( data ))
and then the 95% confidence limits by doing sth like
x = randi(50, 1, 100); % Create Data
SEM = std(x)/sqrt(length(x)); % Standard Error
ts = tinv([0.025 0.975],length(x)-1); % T-Score
CI = mean(x) + ts*SEM;
and then check to see if the data falls in the 95% confidence limit by testing to see if it is >=CI(1) and <=CI(2)? If so you consider it as a good data? Is my understanding correct? Again, thank you so much in advance!
Star Strider
2021 年 8 月 28 日
As always, my pleasure!
‘Is my understanding correct?’
Yes, ss far as I can determine, since it is mine as well.
.
Susan
2021 年 9 月 3 日
編集済み: Susan
2021 年 9 月 3 日
Moreover, when I apply
Zc = cellfun(@(x)rmoutliers([real(x) imag(x)]), Zc, 'Unif',0)
I get the following error
Undefined function 'real' for input arguments of type 'cell'.
Here Zc is a 12*190 cell where each cell contains 1*336 cells i.e,
Zc--> 12*190 cell
Zc{1,1}--> 1*336 cell
Zc{1,1}{1,1} --> a complex matrix with the size of 600-by-1
which I'd like to apply function "@(x)rmoutliers([real(x) imag(x)])" on this 600-by-1 matrix. Could you please tell me how I can solve this issue?
Moreover, I have time vectors associated to this data streams, i.e., one time vector for each data streams. The data set for time vector "t" has the exact format as data stream sets, i.e., 12*190 cell, 1*336 cell, and each of these 1*336 cells contain a matrix of 600-by-1. As you mentioned earlier " If they have associated time vectors, that will need to be considered and adjusted using the isoutlier function for each column so that appropriate times are also deleted. " Could you please tell me how I can do that?
Your help is truly appreciated.
Star Strider
2021 年 9 月 3 日
I have no idea what your ‘Zc’ is, since it has never been posted.
As opposed to the one I created, your ‘Zc’ may itself contain other cells within cells.
Z(:,1) = complex(randn(12,1), randn(12,1));
Z(:,2) = complex(randn(12,1), randn(12,1));
Z(:,3) = complex(randn(12,1), randn(12,1))
Z =
-0.5853 + 2.4639i 0.6306 - 0.3304i 0.6598 - 0.9735i
0.6066 + 0.0160i 2.5984 + 0.5309i -1.6019 + 0.7852i
-0.3412 + 0.7698i -0.4684 - 1.1739i -0.9331 + 1.3424i
-1.3994 - 0.2494i 0.6664 - 0.4860i -0.4166 + 0.7058i
2.7730 - 0.8464i -0.2850 - 0.9701i -0.7736 - 0.4878i
0.7362 - 0.0014i -0.0358 - 1.2007i 0.9366 + 1.3649i
-0.2278 + 1.3435i -0.6741 - 0.3080i -0.5566 - 0.3074i
1.0900 - 0.2222i -1.3691 - 0.0885i -0.4428 + 2.1068i
0.2093 + 0.2958i 0.3628 - 1.1968i -0.9282 - 0.0533i
-0.6590 + 0.4372i -1.7537 - 0.9631i 1.3973 - 0.4798i
-1.3556 - 0.5976i -0.8509 - 1.8627i -0.5903 + 0.0886i
-0.2406 + 0.2537i -0.7712 - 0.1531i -0.9282 - 1.7822i
Zc = {mat2cell(Z, size(Z,1), ones(1,size(Z,2)))} % Such As This
Zc = 1×1 cell array
{1×3 cell}
% Query = isoutlier([real([Zc{:}]) imag([Zc{:}])])
Zc = cellfun(@(x)rmoutliers([real(x) imag(x)]), Zc{:}, 'Unif',0) % <— Try This
Zc = 1×3 cell array
{10×2 double} {11×2 double} {11×2 double}
Attempting to emulate that with this version of ‘Zc’ initially threw a similar error to the one you posted (there are apparently version differences in the error messages), and the change I made in the cellfun call to deal with that, works here. See if it works with your ‘Zc’.
.
Susan
2021 年 9 月 3 日
編集済み: Susan
2021 年 9 月 3 日
Thank you so much for your quick replys. you are right, my ‘Zc’ itself contains other cells within cells. When I try
Zc = cellfun(@(x)rmoutliers([real(x) imag(x)]), Zc{:}, 'Unif',0)
here my Zc is 12*190 cell which contains other cells within, i.e., each cell contains 1*336 cell. I get
Error using @(x)rmoutliers([real(x),imag(x)])
Too many input arguments.
I try to solve this issue using the following line
Zc_withouthOutliers = cellfun(@(x)cellfun(@(x)rmoutliers([real(x) imag(x)]), x, 'UniformOutput', false), Zc, 'Unif',0);
However, it turns out that the length of data streams are the same as previous, which means no outliers. But I know it's not a case.
Any idea?
Star Strider
2021 年 9 月 3 日
Unfortunately, no, because I am still guessing what your ‘Zc’ is.
Perhaps:
Z(:,1) = complex(randn(12,1), randn(12,1));
Z(:,2) = complex(randn(12,1), randn(12,1));
Z(:,3) = complex(randn(12,1), randn(12,1))
Z =
0.2265 - 1.6841i 0.6258 + 1.2717i -0.5703 + 0.6599i
1.3231 - 0.0957i 0.3498 + 0.5286i -1.5460 - 1.3338i
0.0539 - 1.6916i 0.4286 - 0.7961i -2.0087 + 1.0217i
0.4872 - 0.1514i -0.3148 - 0.2921i -0.0742 - 1.0811i
-2.0700 - 0.4709i -2.7228 + 0.4142i 1.0667 + 0.3233i
-0.4754 + 0.7001i -1.2523 - 1.7435i -0.4367 - 0.5155i
0.6550 - 0.8458i 0.1090 - 1.1173i 0.7603 - 2.1107i
0.1041 - 0.7291i -0.2754 + 0.4496i -0.1638 + 1.2572i
0.0799 - 1.0766i -1.0184 - 1.9236i -0.6640 + 0.1175i
-3.4014 + 0.5304i 0.9365 - 0.6210i -1.3426 + 0.8949i
2.6220 - 0.4183i 0.4284 - 1.8616i 0.4942 - 0.1083i
-0.4524 - 0.6606i 0.2951 + 0.0146i -2.0342 - 1.4751i
Zc = {mat2cell(Z, size(Z,1), ones(1,size(Z,2)))} % Such As This
Zc = 1×1 cell array
{1×3 cell}
% Query = isoutlier([real([Zc{:}]) imag([Zc{:}])])
Zc = cellfun(@(x)rmoutliers([real(x) imag(x)]), [Zc{:}], 'Unif',0) % <— Try This
Zc = 1×3 cell array
{10×2 double} {11×2 double} {12×2 double}
Note — The argument is [Zc{:}] now. See if adding the square brackets helps.
.
Susan
2021 年 9 月 3 日
編集済み: Susan
2021 年 9 月 3 日
Thanks, it works now! I really appreciate your help! And sorry that I haven't posted my dataset since it's a huge dataset. But your guess about the structure of my data set is correct.
I promise it would be my last question on this. Could you please tell me how can I find the assocated time vector for each of these streams? Time data set has the exact format as data stream sets (Zc), i.e., 12*190 cell each cell contains 1*336 cell, and each of these 1*336 cells contain a matrix of 600-by-1. As you mentioned earlier " If they have associated time vectors, that will need to be considered and adjusted using the isoutlier function for each column so that appropriate times are also deleted. " Could you please tell me how I can do that?
Many many thanks in advance
Star Strider
2021 年 9 月 3 日
Thank you!
Don’t worry about asking questions. Whatever they are, I will do my best to provide useful answers.
‘Could you please tell me how I can do that?’
Probably the easiest way is to do something like this —
Z(:,1) = complex(randn(12,1), randn(12,1));
Z(:,2) = complex(randn(12,1), randn(12,1));
Z(:,3) = complex(randn(12,1), randn(12,1))
Z =
-0.0037 - 0.0718i -0.5906 - 0.4982i -0.2926 - 1.4403i
-0.0491 + 0.3881i -1.1934 - 0.6266i 1.6255 - 0.6844i
-0.5010 + 0.8418i -0.3764 - 0.5512i -0.6346 + 0.5429i
-0.9482 - 2.0047i 0.7089 + 0.3901i -0.6989 - 1.0210i
1.5427 + 1.0627i -1.4086 - 0.3650i 1.3917 - 1.6212i
0.4960 + 1.3108i 1.1715 + 1.4088i -1.3572 + 0.0194i
-0.4169 + 0.7410i 0.1213 + 0.0061i -0.2981 - 0.3166i
0.1113 + 0.7298i -0.0477 - 0.6290i 0.4736 + 0.2349i
-0.9848 - 0.3802i 0.5130 - 1.9157i -0.3988 + 0.4671i
-0.0238 + 1.2133i 2.9185 - 0.1249i -0.5662 + 0.7225i
-0.0214 - 0.2302i -0.1938 - 0.2917i -1.0284 - 0.4458i
-0.0793 - 0.4082i 2.1012 + 0.4646i 0.2128 - 0.6308i
Zc = {mat2cell(Z, size(Z,1), ones(1,size(Z,2)))}
Zc = 1×1 cell array
{1×3 cell}
Outlierc = (cellfun(@(x)isoutlier([real(x) imag(x)]), [Zc{:}], 'Unif',0)) % Find Outliers
Outlierc = 1×3 cell array
{12×2 logical} {12×2 logical} {12×2 logical}
Lv = ~any([Outlierc{:}],2) % Create Logical Column Vector
Lv = 12×1 logical array
1
1
1
1
0
0
1
1
0
1
Zc = cellfun(@(x)rmoutliers([real(x) imag(x)]), [Zc{:}], 'Unif',0)
Zc = 1×3 cell array
{11×2 double} {10×2 double} {12×2 double}
Then use ‘Lv’ as necessary with either the entire array or selected columns (computing it for each column in that instance) to address the time vector as well as the data. It will return true or 1 for the ‘good’ data (as written here), so using that to reference the entire array (or individual columns) as well as the ‘Time’ vector will return corresponding good data for all of them.
So, if ‘Time’ is a column vector of times,
t = Time(Lv);
will be a vector of times that correspond to rows without outliers.
Experiment with it to be certain it does what you want it to do.
.
Susan
2021 年 9 月 3 日
Thank you SO MUCH! You have no idea how much I appreciate your help.
I underestood the logic. However, I've got a question. Suppose the outlier in Zc{1,1}{1,1} happens on the 5th row, and outlier in Zc{1,1}{1,2} happens on the 6th and 9th rows.
then if I apply t=Time(Lv) it will results in
size(t{1,1}{1,1}) = 9-by-1 % while it should be 11-by-1
size(t{1,1}{1,2}) = 9-by-1 % While it should be 10-by-1
size(t{1,1}{1,3}) = 9-by-1 % While it should be 12-by-1
Am I right? What I am looking for is to find the first "t" vector associated to the first Zc complex signal (first column of Z). So they have to have the same lenght, right?
Star Strider
2021 年 9 月 3 日
As always, my pleasure!
You will have to decide how you want to approach this.
If you want to evaluate by columns, then create a different ‘Lv’ for each column, including the associated time vector.
If you want to consider the entire matrix at once, then create ‘Lv’ for the matrix, and go from there, including the time voector for all the rows, so that in the end, only the matrix of variables and time vector where none of the columns have outliers will be the matrix you will end up using.
You will have to decide how you want to analyse your data. Statistical considerations may be important here (depending on what you are doing), so consulting with a statistician would likely help you decide.
.
その他の回答 (1 件)
John D'Errico
2021 年 8 月 27 日
編集済み: John D'Errico
2021 年 8 月 27 日
Is it valid to work with the real and imaginary parts separately? Possibly, though you know the data better than we do. What causes an outlier? If there is a problem with the real component of a number, why would it not have impacted the imaginary part too?
I would assume you can simply work with the real and imaginary parts separately. But you cannot just REMOVE an outlier. You need to correct it. So you might decide to apply the tool filloutliers to each column of the arrray, separately to the real and complex parts, treating them as simply independent signals. That may not be totally valid of course. But can you do it? Of course.
You would use a loop over the columns of your matrix. Something like:
for ind = 1:ncols
R = filloutliers(real(M(:,ind)),'gesd');
I = filloutliers(imag(M(:,ind)),'gesd');
M(:,ind) = complex(R,I);
end
You would need to play around to find what works best on your data of course.
1 件のコメント
Susan
2021 年 8 月 27 日
編集済み: Susan
2021 年 8 月 27 日
Thank you so much for your reply. It maked me to think more about the problem and data set I am working on. When I apply your code on my data, I got the following error
Error using filloutliers>parseinput (line 236)
Expected input number 2, Fill, to match one of these values:
'center', 'clip', 'previous', 'next', 'nearest', 'linear', 'spline', 'pchip', 'makima'
The input, 'gesd', did not match any of the valid values.
Error in filloutliers (line 118)
parseinput(a, fill, varargin);
Any idea? Why did you select 'gesd' here?
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!エラーが発生しました
ページに変更が加えられたため、アクションを完了できません。ページを再度読み込み、更新された状態を確認してください。
Web サイトの選択
Web サイトを選択すると、翻訳されたコンテンツにアクセスし、地域のイベントやサービスを確認できます。現在の位置情報に基づき、次のサイトの選択を推奨します:
また、以下のリストから Web サイトを選択することもできます。
最適なサイトパフォーマンスの取得方法
中国のサイト (中国語または英語) を選択することで、最適なサイトパフォーマンスが得られます。その他の国の MathWorks のサイトは、お客様の地域からのアクセスが最適化されていません。
南北アメリカ
- América Latina (Español)
- Canada (English)
- United States (English)
ヨーロッパ
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
アジア太平洋地域
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)