Can I somehow improve performance of str2double?

42 ビュー (過去 30 日間)
MichaU709
MichaU709 2021 年 1 月 22 日
コメント済み: Florian Berzsenyi 2023 年 1 月 3 日
Hello, Is there some solution to improve perfomance by using some other method than str2double while I am converting huge arrays of strings? My whole function needs almost 15 seconds to convert these strings. It affects significantly to perfomance of my script.
Elapsed time is 7.239212 seconds.
Elapsed time is 6.984212 seconds.
  2 件のコメント
Steven Lord
Steven Lord 2021 年 1 月 22 日
What do you mean by "huge" in "huge array of strings"? Are we talking on the order of 1,000 strings; 1,000,000 strings; 1,000,000,000 strings; etc.?
If it takes MATLAB 15 seconds to convert 15 thousand strings, that's quite a different situation than MATLAB taking 15 seconds to convert 15 million strings.
MichaU709
MichaU709 2021 年 1 月 22 日
編集済み: MichaU709 2021 年 1 月 22 日
Sorry for not specifying.I am trying to converter about 1 000 000 strings.

サインインしてコメントする。

採用された回答

Stephen23
Stephen23 2021 年 1 月 22 日
編集済み: Stephen23 2021 年 1 月 22 日
The fastest conversion uses low-level commands, e.g. sprintf and sscanf. Instead of this:
C = {'1.2','3.4','5.6'};
V = str2double(C)
V = 1×3
1.2000 3.4000 5.6000
do this (add reshape if required, or specify the optional size argument):
V = sscanf(sprintf(' %s',C{:}),'%f',[1,Inf])
V = 1×3
1.2000 3.4000 5.6000
See also:
  1 件のコメント
MichaU709
MichaU709 2021 年 1 月 22 日
Thank you. That's perfect solution.

サインインしてコメントする。

その他の回答 (1 件)

Yair Altman
Yair Altman 2021 年 2 月 9 日
I use the following function to convert a cell-array of strings to a cell-array of numeric values (where applicable) - it is ~2-5x faster than str2double or sscanf, depending on the specific inputs (YMMV). It is fastest for positive integers up to 10^15-1, still fast for doubles, and about the same speed as sscanf for all other inputs. Note that strings are left unchanged in the input cell-array.
function results = strs2number(cellStrs)
numVals = numel(cellStrs);
results = cellStrs; % pre-allocate
powersOf10 = [10000000000000 1000000000000 100000000000 10000000000 ...
1000000000 100000000 10000000 1000000 100000 10000 1000 100 10 1];
Nmax = 14; %=length(powersOf10);
%results(cellfun('isempty',results)) = {[]}; % it's faster to loop below
for idx = 1 : numVals
value = cellStrs{idx};
N = length(value);
if N==0
results{idx} = []; % '' => []
continue
elseif value(1) > '9' % skip the most obvious non-numeric strings first
continue
elseif N > Nmax % e.g. '20210209 08:35:40'
continue
end
%results{idx} = str2number(value); % inlined below for performance
isDigit = value>='0' & value<='9';
if all(isDigit) %simple positive integer
powers = powersOf10(Nmax-N+1:Nmax); %faster than powersOf10(end-N+1:end) or 10.^(N-1:-1:0)
results{idx} = sum((value-'0') .* powers);
else
isDot = value=='.';
if all(isDigit | isDot) % && N <= Nmax %simple positive FP number
dotIdx = find(isDot);
if numel(dotIdx) > 1, continue, end % ignore IP addresses: '12.34.56.78'
N = N - 1;
shift = dotIdx - N - 1;
factor = 10^shift;
n1 = Nmax-N; n2 = n1+dotIdx;
powers = [powersOf10(n1+1:n2-1), 0, powersOf10(n2:Nmax)];
numericVal = sum((value-'0') .* powers) * factor;
results{idx} = numericVal;
elseif any(value==' ') || any(value=='/') || any(value>'9') || sum(value=='-') > 1 || sum(isDot) > 1 %||value(1)>'9' % non-numeric string (we assume that FP values like '1.23e-5' will never happen)
%results{idx} = value; % unnecessary: already pre-allocated this way
else % negative number etc.
[numericVal,count,errMsg,nextIndex] = sscanf(value,'%f',1); %*much* faster than str2double!
if count == 1 && isempty(errMsg) && nextIndex > N
results{idx} = numericVal;
else
%results{idx} = value; % unnecessary: results was initialized to cellStrs
end
end
end
end
end
  3 件のコメント
Clodoaldo de Souza Faria Júnior
Clodoaldo de Souza Faria Júnior 2021 年 9 月 21 日
I have the same question that you
Florian Berzsenyi
Florian Berzsenyi 2023 年 1 月 3 日
I assume that str2double is simply the more robust method at the (high) cost of processing time. str2double checks the type and parses for separators (.), or has at least has a larger overhead to account for the conversion from scientific-, double- or plain notation to double or integer numbers. double, is a simple casting function. It may work, if your string array absolutely always cotains digits 0-9, or it may fail if you try to convert uncleaned data with "nans", text "nans".

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeCharacters and Strings についてさらに検索

製品


リリース

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by