Convert cell array of strings to unicode quickly

I have an array of approximately 10M strings, and I'm interested in converting each string to its unicode values. Is there a quick, one-line way to convert the whole string array into numeric values? Ideally, I'd love a solution like this:
numeric_matrix = double(string_array);
But of course double (and unicode2native) does not support cells. So my current solution is to loop through the string array:
for ii = 1:length(string_array)
numeric_matrix(ii,:) = double(string_array{ii});
end
Unfortunately this for-loop solution is very inefficient. It can take upwards of 10 minutes for very large numbers of strings. I tried googling this but didn't see anything better. Is there a simpler, faster way to do this, ideally in one line?

 採用された回答

Walter Roberson
Walter Roberson 2016 年 2 月 2 日

0 投票

Try
numeric_array = cellfun(@uint16, stringarray);
Try it on a smaller subset first as I do not know how the timing would compare. It should have the advantage of not needing to change the internal representation.

3 件のコメント

Greg
Greg 2016 年 2 月 2 日
Thanks a lot. I should have thought of cellfun!
However it's strange - using the for loop is twice as fast as cellfun. This might be because I'm doing an extra operation on the unicodes (multiplying by a vector and summing), but I don't see why that would penalize cellfun and not the for loop.
Guillaume
Guillaume 2016 年 2 月 2 日
As far as I understand, matlab native encoding is not unicode but whatever is your system locale, so converting the string to double (or uint16) may not convert it to unicode unless your locale is also unicode. You would have to call native2unicode on the strings to be sure.
Most likely your cellfun is slower than a loop because you're using an anonymous function to perform your extra operation. Anonymous function calls have a significant overhead in matlab.
Greg
Greg 2016 年 2 月 2 日
Thanks. I'm not interested in the unicode values per se. I just wanted a way to turn a string into a (hopefully) unique numeric value. But that's good to know about unicode.
And thanks for mentioning the anonymous function. That's probably what's happening!

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

ヘルプ センター および File ExchangeData Type Conversion についてさらに検索

質問済み:

2016 年 2 月 2 日

コメント済み:

2016 年 2 月 2 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by