Ignore missing data in a table group

Question

0 投票

I have a table with some missing data. I am using findgroups and splitapply to do some calculations on columns of the table, but when a group has a missing value the calculation returns missing.

I would like to ignore the missing value in these calculations but without removing the enitre row- other colums have valid data.

>> T = readtable('messy.csv','TreatAsEmpty',{'.','NA'})
T =
  21×5 table
       A         B          C          D       E  
    ________    ____    __________    ____    ____
    {'afe1'}       3    {'yes'   }       3       3
    {'egh3'}     NaN    {'no'    }       7       7
    {'wth4'}       3    {'yes'   }       3       3
    {'atn2'}      23    {'no'    }      23      23
    {'arg1'}       5    {'yes'   }       5       5
    {'jre3'}    34.6    {'yes'   }    34.6    34.6
    {'wen9'}     234    {'yes'   }     234     234
    {'ple2'}       2    {'no'    }       2       2
    {'dbo8'}       5    {'no'    }       5       5
    {'oii4'}       5    {'yes'   }       5       5
    {'wnk3'}     245    {'yes'   }     245     245
    {'abk6'}     563    {0×0 char}     563     563
    {'pnj5'}     463    {'no'    }     463     463
    {'wnn3'}       6    {'no'    }       6       6
    {'oks9'}      23    {'yes'   }      23      23
    {'wba3'}     NaN    {'yes'   }     NaN      14
    {'pkn4'}       2    {'no'    }       2       2
    {'adw3'}      22    {'no'    }      22      22
    {'poj2'}     -99    {'yes'   }     -99     -99
    {'bas8'}      23    {'no'    }      23      23
    {'gry5'}     NaN    {'yes'   }     NaN      21
    
    >> [G,gen]=findgroups(T(:,[3])); %find groups based on column C
    
    >> gen
gen =
  2×1 table
       C   
    _______
    {'no' }
    {'yes'}
    
    %% find mean of columns E and D based on Groups in column C ('no' or 'yes')
    >> MeanE=splitapply(@mean,T(:,5),G)
MeanE =
          61.4444444444444
          44.4181818181818
    % works 
          
    >> MeanD=splitapply(@mean,T(:,4),G) 
MeanD =
          61.4444444444444
                       NaN
    % Does not work for 'yes' group due to NaN values in column 4
    
    >> MeanD=splitapply(@mean,rmmissing(T(:,4)),G);
    %fails because rmmissing(T(:,4)) and G are differnet sizes
    

I would like to be able to ignore the NaN values in column D when calcualting the mean. I can't seem to make it work with ismissing or rmmissing, and feel like this should be simpler than splitting them manually.

2 件のコメント
なしを表示なしを非表示

Johan 2021 年 10 月 27 日

MATLAB Online で開く

I'm not used to working with table but maybe instead of using @mean you can define a function and use this in your splitapply call.

mymean = @(x) mean(x,'omitnan');
splitapply(mymean,T(:,4),G) 

Marcus Glover 2021 年 10 月 27 日

Thank you, omitnan works for me.

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Ive J 2021 年 10 月 27 日

MATLAB Online で開く

0 投票

As Johan also suggested you may use omitnant flag. Also, consider using groupsummary and groupfilter :

m = groupsummary(T, 'C', @(x)mean(x, 'omitnan'), {'D', 'E'})
        C         GroupCount    fun1_D    fun1_E
    __________    __________    ______    ______
    {0×0 char}         1           563       563
    {'no'    }         9        61.444    61.444
    {'yes'   }        11          50.4    44.418

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

Marcus Glover 2021 年 10 月 27 日

編集済み: Marcus Glover 2021 年 10 月 27 日

Thanks! My only reluctance to use groupsummary is that it seems (to me anyway...) that I lose the group indexing so I have to use findgroups anyway to work with the individual group members- ie if I wanted to make a scatterplot of B vs E for only 'no' group members or something.

サインインしてコメントする。

Ignore missing data in a table group

2 件のコメント
なしを表示なしを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

製品

リリース

タグ

Community Treasure Hunt

Ignore missing data in a table group

2 件のコメント なしを表示 なしを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

製品

リリース

タグ

参考

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示