Processing a HUGE number of timestamps

2 ビュー (過去 30 日間)
Matlab2010
Matlab2010 2014 年 11 月 13 日
回答済み: Jan 2014 年 11 月 16 日
I have a cell array whose elements are time stamps in the format "Mon Apr 01 20:00:00 BST 2013". I have a very large number of these vectors. At the moment, I loop through each value in the vector and apply the below function. This loop taking up 99% of my processing time.
How can I remove the loop?
thanks
function myTimeOut = st_timestampConvert(myTime)
year = strtrim(myTime(end-4:end));
month = strtrim(myTime(5:8));
day = strtrim(myTime(9:10));
time = strtrim(myTime(11:19));
timezone = strtrim(myTime(20:23));
myTimeOut = convert_to_UTC(myTimeOut, timezone); %time zone conversion
myTimeOut = datenum([day '-' month '-' year ' ' time], 'dd-mmm-yyyy HH:MM:SS');
end

採用された回答

Guillaume
Guillaume 2014 年 11 月 14 日
Without R2014b datetime, you can use regexprep to rearrange the bits of the string you want before calling datenum. It's many orders of magnitude faster than a loop, cellfun, or strsplit.
s2 = regexprep(s, '(\w+) (\w+) (\d+) ([0-9:]+) (\w+) (\d+)', '$3-$2-$6 $4');
tout = datenum(s2, 'dd-mmm-yyyy HH:MM:SS');
On my machine, to process 100k dates, the above two lines takes 2.1 seconds , most of it taken by the datenum operation. The regexp line is only 0.3 seconds.
There remains the problem of the time zone adjustment (which I believe should have come after the conversion to datenum in your example). Your convert_to_UTC is not part of matlab. Hopefully it can operate on cell arrays as well. Thus to extract the timezone:
tzones = regexp(s, '\w+(?= \d+$)', 'match', 'once');
tout = convert_to_UTC(tout, tzones); %Will this work?

その他の回答 (2 件)

Peter Perkins
Peter Perkins 2014 年 11 月 13 日
none, I don't know if you have access to R2014b. If you do, consider using the new datetime data type. On a not so fast PC, parsing 100000 strings like yours, with time zones, takes a bit over 2s. 'BST' presents a potential issue, because it might mean any number of things. In the UK, it means "British Summer Time", and the following parses the strings using that locale. Hope this helps.
% construct 100k strings
d = datetime(2013,4,1,20,0,0,'TimeZone','Europe/London') + days(randn(100000,1));
s = cellstr(d,'eee MMM dd HH:mm:ss z yyyy','en_UK');
ans =
Tue Apr 02 14:33:32 BST 2013
% parse those strings
tic
d1 = datetime(s,'Format','eee MMM dd HH:mm:ss z yyyy','TimeZone', ...
'Europe/London','Locale','en_UK');
toc
  1 件のコメント
Matlab2010
Matlab2010 2014 年 11 月 14 日
Peter.
I am using win7 and 2014A.
I have N cell arrays, each with T elements. N = 0.5M, T = 0.25M.
I have a PC with 244GB RAM and 32 Cores. I use parfor for such jobs.
BST does mean British summer time. Though I deal with all time zone conversions myself. You can assume each N is in the same time zone, thus, the time zone conversion can be done as a single vectorized operation.
The issue is how to extract from this cell and get into datenum without using a for loop.
thank you

サインインしてコメントする。


Jan
Jan 2014 年 11 月 16 日
I prefer Guillaume's version, but it is not "magnitudes" faster than a loop approach:
function DOut = ConvertCellDate(DIn)
DOut = zeros(size(DIn));
for k = 1:numel(DIn)
Dx = double(DIn{k} - '0'); % For faster conversion of numbers
month = (strfind('JanFebMarAprMayJunJulAugSepOctNovDec', DIn{k}(5:7)) + 2) / 3;
year = Dx(25) * 1000 + Dx(26) * 100 + Dx(27) * 10 + Dx(28);
DOut(k) = datenummx(year, month, Dx(9) * 10 + Dx(10), ...
Dx(12) * 10 + Dx(13), Dx(15) * 10 + Dx(16), ...
Dx(18) * 10 + Dx(19));
end
Phew, this looks cruel and is not smart to debug. But it takes 2.3 sec on my Matlab 2011b/Win7/32 system, while Guillaume's method takes 1.7 sec.

カテゴリ

Help Center および File ExchangeDates and Time についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by