Processing array where the elements are sometimes min/sec and sometimes hour/min/sec

Question

0 投票

I have some race data of the form

raceTime = {'28:44','54:08','1:02:34','1:58:33'};

Because some times are less than an hour, and some are more, the inputs are in both hh:mm:ss and mm:ss format. It will never be the case that ##:## represents hh:mm.

I'm trying to get the duration (in, say, minutes) of these race times. Thoughts on the most elegant way to process this? (I can think of a few inelegant ways.)

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

dpb 2018 年 10 月 15 日

Yeah, the inflexibility of some of the input forms is maddening at times like these, agreed...

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

dpb 2018 年 10 月 16 日

編集済み: dpb 2018 年 10 月 16 日

MATLAB Online で開く

1 投票

I won't claim it's elegant and certainly not "most" so, but...

function et=raceDuration(tstring)
% return durations for [hh:]mm:ss input cell string array
hms=cellfun(@(s) split(s,":"),tstring,'uni',0);  % get pieces (not all complete)
for i=1:length(hms)
  try
    et(i)=duration(str2double(hms{i}).');
  catch
    et(i)=duration([0 str2double(hms{i}).']);
  end
end

works for your example data.

Any other way to fixup the missing hours that came to me at least so far seemed more painful than the loop; unfortunately no way to put a try...catch...end construct in a cellfun anonymous function to deal with the missing field.

ALTERNATE

(W/ attribution to Stephen for (again) reminding me sscanf will handle unusual cases more gracefully than I always think will...)

hms=cellfun(@(s) sscanf(s,"%d:"),raceTime,'uni',0)';
te=duration(cell2mat(cellfun(@(x) [zeros(1,3-length(x)) x.'],hms,'uni',0).'));
>> te
te = 
4×1 duration array
 00:28:44
 00:54:08
 01:02:34
 01:58:33
>>

And, the above "trick" cleans up the original function quite a bit, too...

function et=raceDuration(tstring)
  % return durations for [hh:]mm:ss input cell string array
    hms=cellfun(@(s) sscanf(s,"%d:"),raceTime,'uni',0)';  % get pieces (not all complete)
    N=numel(hms);
    te(N,1)=duration();                                   % preallocate
    for i=1:N
    try
      et(i)=duration(hms{i}.');
    catch
      et(i)=duration([0 hms{i}.']);
    end
  end

ADDENDUM

And, of course, you can change the Format property...

>> te.Format='m'
te = 
  4×1 duration array
   28.733 min
   54.133 min
   62.567 min
   118.55 min
>> te.Format='s'
te = 
  4×1 duration array
   1724 sec
   3248 sec
   3754 sec
   7113 sec
>>

depending on how want the result to look...

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

Stephen23 2018 年 10 月 16 日

編集済み: Stephen23 2018 年 10 月 16 日

MATLAB Online で開く

1 投票

As long as the last unit is always the same then you could use this:

>> C = {'28:44','54:08','1:02:34','1:58:33'};
>> V = [60,1,1/60]; % [H,M,S]
>> F = @(s)V(end-nnz(s==':'):end)*sscanf(s,'%d:');
>> M = cellfun(F,C)
M =
    28.733    54.133    62.567   118.550

It will be reasonably efficient as it does not change/duplicate the input data, and uses efficient sscanf and matrix multiplication. For maximum speed replace cellfun with a preallocated loop.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Answer 3

the cyclist 2018 年 10 月 16 日

MATLAB Online で開く

1 投票

Here is the idea I had, after posting this. The core idea is to create a "template" of zeros for the largest format needed (e.g. 00:00:00), and then superimpose the available digits on the end.

% Original data
raceTime = {'28:44','58:39','1:02:34','1:58:33'};
% Create a template of all zeros, for which the times will be superimposed
templateFormat = '00:00:00';
template = cell(size(raceTime));
template(:) = {templateFormat};
% Need to know the number of digits to superimpose
digitsInRaceTime = cellfun(@(x) size(x,2),raceTime,'UniformOutput',false);
% For each element, superimpose the right digits 
durationCell = cellfun(@(x,y,z)([x(1:(numel(templateFormat)-z)) y]),template,raceTime,digitsInRaceTime,'UniformOutput',false);
% Get duration
durationInMinutes = minutes(duration(durationCell))

4 件のコメント
2 件の古いコメントを表示 2 件の古いコメントを非表示

dpb 2018 年 10 月 17 日

I hadn't seen this until after I added the alternate solution triggered by Peter's, but I commented identically the same idea that it seems as though that would be a relatively easy option to have included in the function design and seems to me a reasonable if not highly important enhancement.

Stephen23 2018 年 10 月 17 日

編集済み: Stephen23 2018 年 10 月 17 日

I also tried this kind of thing (I used regexprep), but the fact that it requires making copies of the data is not particularly "elegant" in my view: why make extra variables when it can be done efficiently using the existing data?

サインインしてコメントする。

Answer 4

Peter Perkins 2018 年 10 月 17 日

MATLAB Online で開く

1 投票

cyclist, if you know that the text is a mixture of those two formats, can't you convert using one format, and then go back and convert the things that failed, using the second format? Maybe someone else already suggested that.

>> raceTime = {'28:44','54:08','1:02:34','1:58:33'};
>> t = duration(raceTime,'InputFormat','mm:ss')
t = 
  1×4 duration array
   00:28:44   00:54:08        NaN        NaN
>> i = isnan(t)
i =
  1×4 logical array
   0   0   1   1
>> t(i) = duration(raceTime(i),'Format','hh:mm:ss')
t = 
  1×4 duration array
   00:28:44   00:54:08   01:02:34   01:58:33

Or just tack on a leading hours field where needed?

>> raceTime(i) = strcat('0:',raceTime(i))
raceTime =
  1×4 cell array
    {'0:28:44'}    {'0:54:08'}    {'1:02:34'}    {'1:58:33'}
>> t = duration(raceTime,'Format','hh:mm:ss')
t = 
  1×4 duration array
   00:28:44   00:54:08   01:02:34   01:58:33

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

dpb 2018 年 10 月 17 日

" can't you convert using one format, and then go back and convert the things that failed,"

That was my first approach altho I put in try...catch block. The logical addressing is good...if could fold into an anonymous function somehow--have to mull that over.

サインインしてコメントする。

Answer 5

dpb 2018 年 10 月 17 日

編集済み: dpb 2018 年 10 月 17 日

MATLAB Online で開く

1 投票

OK, thanks to Peter for triggering the idea on how to add the missing hour substring dynamically! :)

>> et=cellfun(@(s) duration(sscanf([repmat('00:',sum(s==':')==1) s],'%d:').','Format','hh:mm:ss'),raceTime)
et =
1×4 duration array
  00:28:44    00:54:08    01:02:34    01:58:33
>>

I am using R2017b so duration is still limited to the three numeric inputs; it doesn't accept the time string form. Not sure which release had the enhancement; but if one has that then can remove the call to sscanf and parse the augmented string directly--

cellfun(@(s) duration([repmat('00:',sum(s==':')==1) s]),'Format','hh:mm:ss'),raceTime,'uni',0)

Seems like it wouldn't be too much of a stretch to let leading missing field(s) be implied zeros automagically...

5 件のコメント
3 件の古いコメントを表示 3 件の古いコメントを非表示

the cyclist 2018 年 10 月 17 日

編集済み: the cyclist 2018 年 10 月 17 日

I'm not going to do exhaustive time-testing. My real data is only about 1,000 elements, so time is not that big a deal. But I ran

Peter's two-step duration code
my own "template" code
this one-liner
Stephen's sscanf code

one thousand times each. The results were roughly

1 second
6 seconds
4 seconds
7 seconds

dpb 2018 年 10 月 17 日

That's a good catch to pull duration out of the cellfun, cyclist.

Interesting the relative poor showing of sscanf; as so often the case, sometimes what we think is a bottleneck may turn out not to be or vice versa...by the rank of Peter's, I'd guess probably the try...catch loop would fare pretty well as well altho I hadn't tried any timings was mostly just playing "golf" to see if could get it down to the one-liner as entertainment! :)

サインインしてコメントする。

Answer 6

the cyclist 2018 年 10 月 17 日

編集済み: the cyclist 2018 年 10 月 17 日

MATLAB Online で開く

0 投票

Oh, dear. I just found a terrible, wonderful obfuscated solution:

digitArray = char(raceTime)-'0';
idx = digitArray(:,end)==-16;
digitArray(idx,:) = [zeros(sum(idx),2), digitArray(idx,1:end-2)];
durationInMinutes = digitArray(:,1)*60 + digitArray(:,3)*10 + digitArray(:,4) + digitArray(:,6)/6 + digitArray(:,7)/60;

The same thousand iterations of this takes only 0.07 seconds.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Processing array where the elements are sometimes min/sec and sometimes hour/min/sec

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

回答 (6 件)

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

4 件のコメント
2 件の古いコメントを表示 2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

5 件のコメント
3 件の古いコメントを表示 3 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

タグ

Community Treasure Hunt

Processing array where the elements are sometimes min/sec and sometimes hour/min/sec

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

回答 (6 件)

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

4 件のコメント 2 件の古いコメントを表示 2 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

5 件のコメント 3 件の古いコメントを表示 3 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

タグ

参考

Community Treasure Hunt

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

4 件のコメント
2 件の古いコメントを表示 2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

5 件のコメント
3 件の古いコメントを表示 3 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示