Efficiently processing unusual date/time format

Question

Michael 2018 年 8 月 1 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/413074-efficiently-processing-unusual-date-time-format

コメント済み: Michael 2018 年 8 月 3 日

I have a number of Excel files with many channels of data as well as an "Absolute Time" column in each. I can read the data in with no issue, but I am processing the "raw" data from the [~,~,raw] = xlsread(...) function because the time is in the unusual DDD:HH:MM:SS.sssssssss format (where DDD is a three digit number for the day of the year). As far as I can tell the datenum and similar functions can't work with this format. So I'm processing it manually by breaking the string into pieces at the colon character, converting the pieces to numbers and multiplying them out. However, processing this column of data stacks up to about half of the total time for processing my files (using the profiler), so I'd like to figure out a more efficient way of doing it.

Here is an excerpt of column A of the spreadsheet:

    ...
  AbsoluteTime
  DDD:HH:MM:SS.sssssssss
  Absolute Time
  131:17:31:20.000000000
  131:17:31:20.050000000
  131:17:31:20.100000000
  131:17:31:20.150000000
  131:17:31:20.200000000
  131:17:31:20.250000000
  131:17:31:20.300000000
    ...

There are an arbitrary number of comment lines at the top of the file (hence the initial ...). After passing over the comment lines, I arrive at setting an index called varNameRowIdx to the line which has the "DDD:HH:MM:SS.sssssssss" string. I actually don't care whether I end up with absolute times or times relative to the first time value, but I do need to be able to properly handle the case where it rolls over to the next day within the data. Right now, I'm grabbing the first time value ( raw{(varNameRowIdx+2),1}) in the first sheet ( jj of 1) of the first file ( ii of 1) and am then offsetting all other time values by that t0 to create relative time.

        % define first time value of first sheet of first file as t = 0
        if 1==ii && 1==jj
            t0 = str2double(strsplit(raw{(varNameRowIdx+2),1},':'));
            t0 = t0(1)*86400+t0(2)*3600+t0(3)*60+t0(4);
        end
        % specially process time column
        idx = 1; tvec = zeros(size(raw,1)-(varNameRowIdx+1),1);
        for kk=(varNameRowIdx+2):size(raw,1)
            tt = str2double(strsplit(raw{kk,1},':'));
            tt = tt(1)*86400+tt(2)*3600+tt(3)*60+tt(4);
            tvec(idx) = tt-t0;
            idx = idx+1;
        end

The tt = str2double(strsplit(raw{kk,1},':')); line is the one that's really expensive, with the strsplit and str2double functions taking 21 and 23% of the total time, respectively. I'm sure there's a better way to process the time stamp data than one element at a time, but I'm not sure what it is. Any suggestions?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

dpb 2018 年 8 月 1 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/413074-efficiently-processing-unusual-date-time-format#answer_331192

編集済み: dpb 2018 年 8 月 1 日

MATLAB Online で開く

Once you have the cell array that holds the date strings, then datetime will convert from day of year...

>> datetime(r,'InputFormat','DDD:HH:mm:ss.SSSSSSSSS')
ans = 
7×1 datetime array
 11-May-2018 17:31:20
 11-May-2018 17:31:20
 11-May-2018 17:31:20
 11-May-2018 17:31:20
 11-May-2018 17:31:20
 11-May-2018 17:31:20
 11-May-2018 17:31:20
>>

ADDENDUM

For the particular case, specifying an output format as well may be useful

>> datetime(r,'InputFormat','DDD:HH:mm:ss.SSSSSSSSS','Format','dd-MMM-uuuu HH:mm:ss.SSS')
ans = 
  7×1 datetime array
   11-May-2018 17:31:20.000
   11-May-2018 17:31:20.050
   11-May-2018 17:31:20.100
   11-May-2018 17:31:20.150
   11-May-2018 17:31:20.200
   11-May-2018 17:31:20.250
   11-May-2018 17:31:20.300
>>

or, since this looks like a sampling dataset, as you're doing now convert to duration--

>> d=t-t(1);
>> d.Format='mm:ss.SSS'
d = 
7×1 duration array
 00:00.000
 00:00.050
 00:00.100
 00:00.150
 00:00.200
 00:00.250
 00:00.300
>>

You then might find

doc timetable
doc timeseries

of interest depending on what else is to be done once have the data.

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

dpb 2018 年 8 月 1 日

"the help for datenum" also suggests better option is to use datetime... VBG

Michael 2018 年 8 月 1 日

編集済み: Michael 2018 年 8 月 1 日

MATLAB Online で開く

Thanks for the addendum. In my case based on your initial answer, I did the following in my code, which was quite efficient:

% define first time value of first sheet of first file as t = 0
if 1==ii && 1==jj
    t0 = datetime(raw((varNameRowIdx+2),1),'InputFormat','DDD:HH:mm:ss.SSSSSSSSS');
end
% specially process time column
tvec = seconds(datetime(raw((varNameRowIdx+2):end,1),'InputFormat','DDD:HH:mm:ss.SSSSSSSSS')-t0);

I probably could have gone a different route, but this worked well since as you correctly guessed I was aiming for timeseries in the end.

サインインしてコメントする。

Answer 2

Peter Perkins 2018 年 8 月 3 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/413074-efficiently-processing-unusual-date-time-format#answer_331522

MATLAB Online で開く

Michael, your data example says, "Absolute Time", so datetime may indeed be the right thing to create. But beginning in R2018a, you can convert certain kinds of "duration text" directly to durations:

>> t = duration('131:17:31:20.000000000','Format','dd:hh:mm:ss.SSSSSSSSS')
t = 
  duration
   131:17:31:20.000000000
>> t.Format = 'd'
t = 
  duration
   131.73 days
>> t.Format = 's'
t = 
  duration
   1.1381e+07 sec

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Michael 2018 年 8 月 3 日

The data was originally recorded with absolute time, but for my plotting purposes I don't really care if the time axis starts at zero or has an offset of millions of seconds. Yours looks like a very good approach, possibly even better than what I did, but since I've fixed my code to "good enough" (roughly a 90% reduction in computation time to do that part of the code) based on dpb's answer, I'll leave that one marked as the "Accepted Answer". Thanks!

サインインしてコメントする。

Efficiently processing unusual date/time format

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Efficiently processing unusual date/time format

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示