How to speed up code for converting an array of strings to array of numbers

Question

Ranjan Sonalkar 2018 年 4 月 6 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/393148-how-to-speed-up-code-for-converting-an-array-of-strings-to-array-of-numbers

編集済み: Ranjan Sonalkar 2018 年 4 月 6 日

I have a cell containing a large vector (over 1.5M entries ) of strings. Many entries are common, so that there are about 15k unique entries in this vector. I need to convert the array of strings to an array of numbers, where there is one-to-one correspondence between each string and the corresponding number. Each string contains 32 characters. I am using the following code that is taking over half hour to run. I suspect that the for-loop is the culprit. I would really appreciate it if anyone can suggest a faster way of accomplishing the same, since I have to process many datasets?

Thanks for your help.

    data_raw = textscan(fid,'%s %f %f %f %f %s %s %s %s %s','Delimiter',',');
    %data_raw = textscan(fid,'%f %f %f %f %f %s %s %s %s %s %s','Delimiter',',');
    % following code converts alpha TTIDs to numeric TTIDs
    unique_TTIDs_alpha = string(unique(data_raw{1}));
    n_points           = length(data_raw{1,1});
    n_points_unique    = length(unique_TTIDs_alpha);
    TTIDs_numeric           = zeros(n_points,1);
    tic
    for ii = 1: n_points_unique
        idx = find(data_raw{1} == unique_TTIDs_alpha(ii));
        TTIDs_numeric(idx,1) = ii;
    end

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Star Strider 2018 年 4 月 6 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/393148-how-to-speed-up-code-for-converting-an-array-of-strings-to-array-of-numbers#answer_313814

I have no idea what your final goal is.

Note that the unique function has as as many as 3 outputs. The first are the unique elements, the second are the indices of the first occurrence of each unique element, and the third are indices (corresponding to the first 2 outputs) of each element, so that replicated elements all get the same number. See the documentation on the unique function for a full discussion.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

Brendan Hamm 2018 年 4 月 6 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/393148-how-to-speed-up-code-for-converting-an-array-of-strings-to-array-of-numbers#answer_313813

MATLAB Online で開く

I am not sure how much of a speedup to expect in the conversion, but I would highly consider using categorical variables for this. This can actually be done directly in textscan using the %C format identifier. In most cases I think this will solve your problem.

If the purpose of the conversion is to have a variable which is easier to query (i.e. numeric comparisson), the categorical will likely do what you are looking for. Take a look at the doc and the methods for this class:

doc categorical
methods('categorical')

If you actually need a numeric value then it is worth noting that a categorical is storing the data in a numeric (uint*, where * depends on the number of categories) format behind the scenes, so can easily be converted to numeric using one of the conversion functions (double, single, uint16). With 15,000 unique categories, you could not represent this with a uint8.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Ranjan Sonalkar 2018 年 4 月 6 日

編集済み: Ranjan Sonalkar 2018 年 4 月 6 日

This would work. It will speed things up since it will eliminate the need for the for-loop I was using. Moreover, by using the categorical variable saves tremendous amount of memory in storing the huge array, since I don't need all the scripts anyway.

Thanks

サインインしてコメントする。

Answer 3

Ranjan Sonalkar 2018 年 4 月 6 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/393148-how-to-speed-up-code-for-converting-an-array-of-strings-to-array-of-numbers#answer_313823

You have answered my question even though you had "no idea what my final goal is". Thanks for pointing out the multiple outputs of unique. The third output is exactly what I need, and it completely eliminates the need for the for-loop that was taking so long.

Thanks.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Star Strider 2018 年 4 月 6 日

As always, my pleasure.

サインインしてコメントする。

How to speed up code for converting an array of strings to array of numbers

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (2 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

How to speed up code for converting an array of strings to array of numbers

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (2 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示