How to fill in NaNs or <undefined> in data with the mode of each column

I have converted a mixed table of both categorical and double arrays into being all columns of type double, via making each category in the categorical arrays a double.
I have a table of 40k rows, and 40 columns. I want to fill in NaNs via replacing each NaN value with the mode value for that column.
I found a clear looping method in R via this link , but couldn't find a simple loop in matlab to do it. inpaint_nans seems to be more focused on interpolation of the data.
knnimpute()
also fails because I can have swathes of up to 1000 rows which are all NaNs (so I need 1200+ neighbours), as well as 40+ columns, so the algorithm has to loop through 40! times which is very slow.
Any ideas?

回答 (1 件)

jgg
jgg 2015 年 12 月 22 日
編集済み: jgg 2015 年 12 月 22 日

0 投票

Select the NaNs and set them to things:
A = [1 2 NaN 4 5; 1 2 3 NaN 5; 1 NaN NaN NaN 5];
m = mode(A,1);
m = repmat(m,size(A,2), 1);
A_f = A;
A_f(isnan(A)) = m(isnan(A));
Looping is not necessary if you use vectorized operations.
Note: if your matrix is very large, the repmat step can be replaced with a for loop over the columns in order to use less memory, but 40k by 40 is not that large, so it should be fine.

2 件のコメント

Dhruv Ghulati
Dhruv Ghulati 2015 年 12 月 22 日
Thanks so much! I changed to
m = repmat(m,size(A,1), 1);
To make the matrix repeat row wise not column wise, but otherwise it worked!
jgg
jgg 2015 年 12 月 22 日
If you liked this answer, please accept it so other people can see it resolved your problem!

サインインしてコメントする。

カテゴリ

製品

質問済み:

2015 年 12 月 21 日

コメント済み:

jgg
2015 年 12 月 22 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by