現在この質問をフォロー中です
- フォローしているコンテンツ フィードに更新が表示されます。
- コミュニケーション基本設定に応じて電子メールを受け取ることができます。
How to insert missing data in?
2 ビュー (過去 30 日間)
古いコメントを表示
I carried out an experiment and automatically got readings.
However the apparatus was supposed to take a reading every 0.1 seconds, and sometimes it was offline but the apparatus kept counting but didn't print these in the. For example:
It would go 98.5, 98.6, 99.0. I'm looking a way to put blank rows in where there is missing data, such as it'll read 98.5, 98.6, 98.7, 98.8, 98.9, 99.0. And to give values of Nan for these. Doing it individually isn't an option, as there is a vast amount of data (864,00 for a day).
回答 (2 件)
Adam
2015 年 4 月 14 日
編集済み: Adam
2015 年 4 月 14 日
times = [98.5 98.6 99.0];
readings = [123 345 567];
expectedTimes = 98.5:0.1:99.0;
newReadings = NaN( size( expectedTimes ) );
newReadings( ismember( expectedTimes, times ) ) = readings;
is one method. Obviously extendible to larger data read in rather than my small hard-coded example.
26 件のコメント
Jaffatron
2015 年 4 月 14 日
When I try this for my data sets it's giving an error message of:
"In an assignment A(I) = B, the number of elements in B and I must be the same"
When I put in
newReadings( ismember( expectedTimes, times ) ) = readings1;
Adam
2015 年 4 月 14 日
編集済み: Adam
2015 年 4 月 14 日
Do you have the same number of readings as you do times? That seemed to be a sensible assumption for this method. There are also other assumptions embedded in it based on the information you gave - e.g. every value in the 'times' vector should exist in the 'expectedTimes' vector. Being doubles this may be what is causing a problem due to the inexact nature of floating point comparisons. It worked in my example, but if your data is a little different that part may not work.
If you want to break it down for debugging then put:
idx = ismember( expectedTimes, times );
and compare
numel( readings1 )
and
nnz( idx )
Jaffatron
2015 年 4 月 14 日
Maybe I should have given some example data, I have say:
A=
98.5 50.5
98.6 50.4
99.0 51.5
And I want:
B=
98.5 50.5
98.6 50.4
98.7 NaN
98.8 NaN
98.9 NaN
99.0 51.5
Hope this helps
Guillaume
2015 年 4 月 14 日
One slight issue with using pure equality comparison (which ismember use) is that 99.9 in expectedTimes may not be actually equal to the 99.9 in times depending on how the two are generated.
This is due to the finite precision of floating point number, and is not specific to matlab. For example note that:
0.1 + 0.1 + 0.1 == 0.3
returns 0 (false). To make sure that both time arrays are indeed considered equal, I would round them to 0.1, thus:
newReadings(ismember(round(expectedTimes, -1), round(times, -1))) = readings; %in R2014b or later
newReadings(ismember(round(expectedTimes*10), round(times*10))) = readings; %in earlier versions
Adam
2015 年 4 月 14 日
編集済み: Adam
2015 年 4 月 14 日
That works for me, assuming I pull out column 1 of A into what I called 'times' and column 2 of A into what I called 'readings'
times = A(:,1);
readings = A(:,2);
Obviously in your code you don't have to create these new variables. You can just plugin A(:,1) and A(:,2) directly.
Adam
2015 年 4 月 14 日
編集済み: Adam
2015 年 4 月 14 日
What is the result then of doing
idx = ismember( expectedTimes, times );
and compare
numel( readings1 )
and
nnz( idx )
?
If it is the inexactness of the equality measure then Guillaume's solution should work. I didn't actually realise round took an argument like that so I thought it would require more work to solve that problem, hence me not doing so until it was confirmed if it is the problem. Since it is just a one line addition though it will make it more robust at very little extra cost.
Adam
2015 年 4 月 15 日
newReadings and expectedTimes should be different from times and readings, but times and readings must be the same length as each other and the result of that ismember call must be a logical array with a number of true values equal to the length of times and readings.
If it isn't that means one or more of the times is not found in your expectedTimes array which is likely where you would need Guillaume's fix, though I think you need:
newReadings(ismember(round(expectedTimes, 1), round(times, 1))) = readings;
if you use the R2014b version unless I am much mistaken (note the 1 instead of -1).
Adam
2015 年 4 月 15 日
even with the rounding included? And have you checked what lengths are being reported for the two things I mentioned above (noting that it is the number of ones - i.e. nnz of the ismember result that should match the number of readings and times you have. The length of the ismember result itself will match that of your expected times).
Adam
2015 年 4 月 15 日
And are you sure your actual times are falling at 0.1s intervals?
The above suggests that there are ~300,000 values in your times array that were not found in your expectedTimes array even with rounding.
If you can locate some of those as an example it should help to understand what is happening here. Try taking a much smaller subset of your times (and set up your expectedTimes to match) and check those times which are not being matched to your expectedTimes array.
Guillaume
2015 年 4 月 15 日
setdiff(round(times, 1), round(expectedTimes, 1))
will show which of the times value are not present in expectedTimes.
The problem is not with Adam's solution but with your assumption that all the times are present in expectedTimes.
So maybe Adam's solution should have startd with this line:
assert(isempty(setdiff(round(times, 1), round(expectedTimes, 1)))
Jaffatron
2015 年 4 月 15 日
Ok I've spent a bit of time at this. My data was slightly skew for some reason. I'm now getting:
>> numel( readings1 )
ans =
853292
>> nnz( idx )
ans =
853291
So it still won't work properly due to one number being missing I'm assuming?
Kelly Kearney
2015 年 4 月 15 日
Try setxor instead ( setdiff only returns values in A but not B, not vice versa):
[c,ia,ib] = setxor(round(times, 1), round(expectedTimes, 1));
Jaffatron
2015 年 4 月 15 日
編集済み: Jaffatron
2015 年 4 月 15 日
Ok, I'm assuming using:
[c,ia,ib] = setxor(round(times, 1), round(expectedTimes, 1));
c gives the missing times, ib gives which values that don't occur in times but occur in expectedTimes, and ia gives the values that occur in expectedTimes but not times.
How do I then incorporate this to solve my problem? As in I now know which vaules I'm missing. How can I leave space for the missing values and put in a NaN for the measurement at that value?
Kelly Kearney
2015 年 4 月 15 日
I suggested setxor based on the comments others had given you, but you should be able to get the same data if you break the ismember call into a separate step:
% Your data (1)
t1 = [98.5 98.6 98.65 99.0];
x1 = [1 2 3 4];
% Full set of times (2)
t2 = 98.5:0.1:99.0;
x2 = nan(size(t2));
% Match up the ones that fit
[tf, loc] = ismember(t1, t2);
x2(loc(tf)) = x1(tf);
% What times are in set 1 but not set 2?
tmiss = t1(~tf);
This snippet will plug the values into the appropriate rows. Then you'll need to look at the times left over in tmiss to figure out why you times that aren't fitting the expected spacing. It might be roundoff error (which you can deal with using round, as suggested above). Or you might have some errant times that you need to deal with in a more hands-on manner.
Jaffatron
2015 年 4 月 17 日
編集済み: Jaffatron
2015 年 4 月 17 日
I got this working up to a certain data point, and then it just stops working. Any idea why that could be the case?
I've attached a screen shot of it. As you can see it works up until 43200 and then it returns a zero for all. I've noticed it reads 43200 as 43200.00000 and not 43200 as it does for 43199. Any ideas what is happening here? And a possible solution?
Michael Haderlein
2015 年 4 月 17 日
It's always a problem when comparing floating point values with each other. I think it should work when you change everything with a factor of 10. I mean, multiply Data1 with 10, set r to 1, t2 to 1:863999, divide Data1_Freq by 10 and things should be fine.
Adam
2015 年 4 月 17 日
ismember does act a little suspiciously sometimes, I'm not sure what its underlying algorithm is.
I had to remove its usage from something I was doing with custom classes recently. For ages it was working fine to check if a given object was a member of an array of objects, but then in some circumstances it began to return 0 claiming the object was not a member. An isequal (...) call on the object and the member of the array that I knew matched it returned true, however so I ended up just changing to use a different implementation.
Amin Rajabi
2020 年 12 月 17 日
Thank you very much, I was looking for a quick way for adding missing records to a vector (a vector that is shorter than the original expected one). It works perfectly.
Jaffatron
2015 年 4 月 17 日
Got it working fully, I used the round function on my t2 array as well as t1 and it seems to have done the trick.
Thanks to everyone that posted on this over the past few days.
参考
タグ
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!エラーが発生しました
ページに変更が加えられたため、アクションを完了できません。ページを再度読み込み、更新された状態を確認してください。
Web サイトの選択
Web サイトを選択すると、翻訳されたコンテンツにアクセスし、地域のイベントやサービスを確認できます。現在の位置情報に基づき、次のサイトの選択を推奨します:
また、以下のリストから Web サイトを選択することもできます。
最適なサイトパフォーマンスの取得方法
中国のサイト (中国語または英語) を選択することで、最適なサイトパフォーマンスが得られます。その他の国の MathWorks のサイトは、お客様の地域からのアクセスが最適化されていません。
南北アメリカ
- América Latina (Español)
- Canada (English)
- United States (English)
ヨーロッパ
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
アジア太平洋地域
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)