Increasing efficiency of one-hot encoding
古いコメントを表示
I have a dataset - 50 variables and an output. There are 17 categories for this dataset. I want to do feature selection on this dataset to determine which variables are significant. I am using the fsrnca function + one-hot encoding (so adding a matrix of size no.observations*17, with 1s and 0s to deal with the categories and concatenating this maxtrix to X so X' = [X_categories X] & y remains as it is. I am wondering if there is a faster way of doing this (than this standard one-hot encoding approach) (run-time is very slow as very high dimensionality). Hope this makes sense. Thanks!
3 件のコメント
Mohammad Sami
2020 年 1 月 16 日
Which step is taking very long?
darova
2020 年 1 月 16 日
And where is the code?
Athul Prakash
2020 年 1 月 28 日
Kindly provide your code so that others can investigate which step is slowing you down.
回答 (1 件)
Walter Roberson
2020 年 1 月 28 日
catnum = uint8(TheCategorical(:).');
numcat = max(catnum);
OH = zeros(NumberOfObservations, numcat);
OH(sub2ind(size(OH), 1:NumberOfObservations, catnum)) = 1;
Or
catnum = uint8(TheCategorical(:).');
OH = sparse(1:NumberOfObservations, catnum, 1);
カテゴリ
ヘルプ センター および File Exchange で Language Support についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!