Ignore missing data in a table group

28 ビュー (過去 30 日間)
Marcus Glover
Marcus Glover 2021 年 10 月 27 日
編集済み: Marcus Glover 2021 年 10 月 27 日
I have a table with some missing data. I am using findgroups and splitapply to do some calculations on columns of the table, but when a group has a missing value the calculation returns missing.
I would like to ignore the missing value in these calculations but without removing the enitre row- other colums have valid data.
>> T = readtable('messy.csv','TreatAsEmpty',{'.','NA'})
T =
21×5 table
A B C D E
________ ____ __________ ____ ____
{'afe1'} 3 {'yes' } 3 3
{'egh3'} NaN {'no' } 7 7
{'wth4'} 3 {'yes' } 3 3
{'atn2'} 23 {'no' } 23 23
{'arg1'} 5 {'yes' } 5 5
{'jre3'} 34.6 {'yes' } 34.6 34.6
{'wen9'} 234 {'yes' } 234 234
{'ple2'} 2 {'no' } 2 2
{'dbo8'} 5 {'no' } 5 5
{'oii4'} 5 {'yes' } 5 5
{'wnk3'} 245 {'yes' } 245 245
{'abk6'} 563 {0×0 char} 563 563
{'pnj5'} 463 {'no' } 463 463
{'wnn3'} 6 {'no' } 6 6
{'oks9'} 23 {'yes' } 23 23
{'wba3'} NaN {'yes' } NaN 14
{'pkn4'} 2 {'no' } 2 2
{'adw3'} 22 {'no' } 22 22
{'poj2'} -99 {'yes' } -99 -99
{'bas8'} 23 {'no' } 23 23
{'gry5'} NaN {'yes' } NaN 21
>> [G,gen]=findgroups(T(:,[3])); %find groups based on column C
>> gen
gen =
2×1 table
C
_______
{'no' }
{'yes'}
%% find mean of columns E and D based on Groups in column C ('no' or 'yes')
>> MeanE=splitapply(@mean,T(:,5),G)
MeanE =
61.4444444444444
44.4181818181818
% works
>> MeanD=splitapply(@mean,T(:,4),G)
MeanD =
61.4444444444444
NaN
% Does not work for 'yes' group due to NaN values in column 4
>> MeanD=splitapply(@mean,rmmissing(T(:,4)),G);
%fails because rmmissing(T(:,4)) and G are differnet sizes
I would like to be able to ignore the NaN values in column D when calcualting the mean. I can't seem to make it work with ismissing or rmmissing, and feel like this should be simpler than splitting them manually.
  2 件のコメント
Johan
Johan 2021 年 10 月 27 日
I'm not used to working with table but maybe instead of using @mean you can define a function and use this in your splitapply call.
mymean = @(x) mean(x,'omitnan');
splitapply(mymean,T(:,4),G)
Marcus Glover
Marcus Glover 2021 年 10 月 27 日
Thank you, omitnan works for me.

サインインしてコメントする。

採用された回答

Ive J
Ive J 2021 年 10 月 27 日
As Johan also suggested you may use omitnant flag. Also, consider using groupsummary and groupfilter :
m = groupsummary(T, 'C', @(x)mean(x, 'omitnan'), {'D', 'E'})
C GroupCount fun1_D fun1_E
__________ __________ ______ ______
{0×0 char} 1 563 563
{'no' } 9 61.444 61.444
{'yes' } 11 50.4 44.418
  1 件のコメント
Marcus Glover
Marcus Glover 2021 年 10 月 27 日
編集済み: Marcus Glover 2021 年 10 月 27 日
Thanks! My only reluctance to use groupsummary is that it seems (to me anyway...) that I lose the group indexing so I have to use findgroups anyway to work with the individual group members- ie if I wanted to make a scatterplot of B vs E for only 'no' group members or something.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeData Type Conversion についてさらに検索

製品


リリース

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by