Increasing efficiency of one-hot encoding

I have a dataset - 50 variables and an output. There are 17 categories for this dataset. I want to do feature selection on this dataset to determine which variables are significant. I am using the fsrnca function + one-hot encoding (so adding a matrix of size no.observations*17, with 1s and 0s to deal with the categories and concatenating this maxtrix to X so X' = [X_categories X] & y remains as it is. I am wondering if there is a faster way of doing this (than this standard one-hot encoding approach) (run-time is very slow as very high dimensionality). Hope this makes sense. Thanks!

3 件のコメント

Mohammad Sami
Mohammad Sami 2020 年 1 月 16 日
Which step is taking very long?
darova
darova 2020 年 1 月 16 日
And where is the code?
Athul Prakash
Athul Prakash 2020 年 1 月 28 日
Kindly provide your code so that others can investigate which step is slowing you down.

サインインしてコメントする。

回答 (1 件)

Walter Roberson
Walter Roberson 2020 年 1 月 28 日

0 投票

catnum = uint8(TheCategorical(:).');
numcat = max(catnum);
OH = zeros(NumberOfObservations, numcat);
OH(sub2ind(size(OH), 1:NumberOfObservations, catnum)) = 1;
Or
catnum = uint8(TheCategorical(:).');
OH = sparse(1:NumberOfObservations, catnum, 1);

カテゴリ

ヘルプ センター および File ExchangeLanguage Support についてさらに検索

質問済み:

2020 年 1 月 14 日

回答済み:

2020 年 1 月 28 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by