How to fill in NaNs or <undefined> in data with the mode of each column
4 ビュー (過去 30 日間)
古いコメントを表示
I have converted a mixed table of both categorical and double arrays into being all columns of type double, via making each category in the categorical arrays a double.
I have a table of 40k rows, and 40 columns. I want to fill in NaNs via replacing each NaN value with the mode value for that column.
I found a clear looping method in R via this link , but couldn't find a simple loop in matlab to do it. inpaint_nans seems to be more focused on interpolation of the data.
knnimpute()
also fails because I can have swathes of up to 1000 rows which are all NaNs (so I need 1200+ neighbours), as well as 40+ columns, so the algorithm has to loop through 40! times which is very slow.
Any ideas?
0 件のコメント
回答 (1 件)
jgg
2015 年 12 月 22 日
編集済み: jgg
2015 年 12 月 22 日
Select the NaNs and set them to things:
A = [1 2 NaN 4 5; 1 2 3 NaN 5; 1 NaN NaN NaN 5];
m = mode(A,1);
m = repmat(m,size(A,2), 1);
A_f = A;
A_f(isnan(A)) = m(isnan(A));
Looping is not necessary if you use vectorized operations.
Note: if your matrix is very large, the repmat step can be replaced with a for loop over the columns in order to use less memory, but 40k by 40 is not that large, so it should be fine.
2 件のコメント
jgg
2015 年 12 月 22 日
If you liked this answer, please accept it so other people can see it resolved your problem!
参考
カテゴリ
Help Center および File Exchange で Data Distribution Plots についてさらに検索
製品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!