Processing array where the elements are sometimes min/sec and sometimes hour/min/sec

1 回表示 (過去 30 日間)
the cyclist
the cyclist 2018 年 10 月 15 日
編集済み: the cyclist 2018 年 10 月 17 日
I have some race data of the form
raceTime = {'28:44','54:08','1:02:34','1:58:33'};
Because some times are less than an hour, and some are more, the inputs are in both hh:mm:ss and mm:ss format. It will never be the case that ##:## represents hh:mm.
I'm trying to get the duration (in, say, minutes) of these race times. Thoughts on the most elegant way to process this? (I can think of a few inelegant ways.)
  1 件のコメント
dpb
dpb 2018 年 10 月 15 日
Yeah, the inflexibility of some of the input forms is maddening at times like these, agreed...

サインインしてコメントする。

回答 (6 件)

dpb
dpb 2018 年 10 月 16 日
編集済み: dpb 2018 年 10 月 16 日
I won't claim it's elegant and certainly not "most" so, but...
function et=raceDuration(tstring)
% return durations for [hh:]mm:ss input cell string array
hms=cellfun(@(s) split(s,":"),tstring,'uni',0); % get pieces (not all complete)
for i=1:length(hms)
try
et(i)=duration(str2double(hms{i}).');
catch
et(i)=duration([0 str2double(hms{i}).']);
end
end
works for your example data.
Any other way to fixup the missing hours that came to me at least so far seemed more painful than the loop; unfortunately no way to put a try...catch...end construct in a cellfun anonymous function to deal with the missing field.
ALTERNATE
(W/ attribution to Stephen for (again) reminding me sscanf will handle unusual cases more gracefully than I always think will...)
hms=cellfun(@(s) sscanf(s,"%d:"),raceTime,'uni',0)';
te=duration(cell2mat(cellfun(@(x) [zeros(1,3-length(x)) x.'],hms,'uni',0).'));
>> te
te =
4×1 duration array
00:28:44
00:54:08
01:02:34
01:58:33
>>
And, the above "trick" cleans up the original function quite a bit, too...
function et=raceDuration(tstring)
% return durations for [hh:]mm:ss input cell string array
hms=cellfun(@(s) sscanf(s,"%d:"),raceTime,'uni',0)'; % get pieces (not all complete)
N=numel(hms);
te(N,1)=duration(); % preallocate
for i=1:N
try
et(i)=duration(hms{i}.');
catch
et(i)=duration([0 hms{i}.']);
end
end
ADDENDUM
And, of course, you can change the Format property...
>> te.Format='m'
te =
4×1 duration array
28.733 min
54.133 min
62.567 min
118.55 min
>> te.Format='s'
te =
4×1 duration array
1724 sec
3248 sec
3754 sec
7113 sec
>>
depending on how want the result to look...

Stephen23
Stephen23 2018 年 10 月 16 日
編集済み: Stephen23 2018 年 10 月 16 日
As long as the last unit is always the same then you could use this:
>> C = {'28:44','54:08','1:02:34','1:58:33'};
>> V = [60,1,1/60]; % [H,M,S]
>> F = @(s)V(end-nnz(s==':'):end)*sscanf(s,'%d:');
>> M = cellfun(F,C)
M =
28.733 54.133 62.567 118.550
It will be reasonably efficient as it does not change/duplicate the input data, and uses efficient sscanf and matrix multiplication. For maximum speed replace cellfun with a preallocated loop.

the cyclist
the cyclist 2018 年 10 月 16 日
Here is the idea I had, after posting this. The core idea is to create a "template" of zeros for the largest format needed (e.g. 00:00:00), and then superimpose the available digits on the end.
% Original data
raceTime = {'28:44','58:39','1:02:34','1:58:33'};
% Create a template of all zeros, for which the times will be superimposed
templateFormat = '00:00:00';
template = cell(size(raceTime));
template(:) = {templateFormat};
% Need to know the number of digits to superimpose
digitsInRaceTime = cellfun(@(x) size(x,2),raceTime,'UniformOutput',false);
% For each element, superimpose the right digits
durationCell = cellfun(@(x,y,z)([x(1:(numel(templateFormat)-z)) y]),template,raceTime,digitsInRaceTime,'UniformOutput',false);
% Get duration
durationInMinutes = minutes(duration(durationCell))
  4 件のコメント
dpb
dpb 2018 年 10 月 17 日
I hadn't seen this until after I added the alternate solution triggered by Peter's, but I commented identically the same idea that it seems as though that would be a relatively easy option to have included in the function design and seems to me a reasonable if not highly important enhancement.
Stephen23
Stephen23 2018 年 10 月 17 日
編集済み: Stephen23 2018 年 10 月 17 日
I also tried this kind of thing (I used regexprep), but the fact that it requires making copies of the data is not particularly "elegant" in my view: why make extra variables when it can be done efficiently using the existing data?

サインインしてコメントする。


Peter Perkins
Peter Perkins 2018 年 10 月 17 日
cyclist, if you know that the text is a mixture of those two formats, can't you convert using one format, and then go back and convert the things that failed, using the second format? Maybe someone else already suggested that.
>> raceTime = {'28:44','54:08','1:02:34','1:58:33'};
>> t = duration(raceTime,'InputFormat','mm:ss')
t =
1×4 duration array
00:28:44 00:54:08 NaN NaN
>> i = isnan(t)
i =
1×4 logical array
0 0 1 1
>> t(i) = duration(raceTime(i),'Format','hh:mm:ss')
t =
1×4 duration array
00:28:44 00:54:08 01:02:34 01:58:33
Or just tack on a leading hours field where needed?
>> raceTime(i) = strcat('0:',raceTime(i))
raceTime =
1×4 cell array
{'0:28:44'} {'0:54:08'} {'1:02:34'} {'1:58:33'}
>> t = duration(raceTime,'Format','hh:mm:ss')
t =
1×4 duration array
00:28:44 00:54:08 01:02:34 01:58:33
  1 件のコメント
dpb
dpb 2018 年 10 月 17 日
" can't you convert using one format, and then go back and convert the things that failed,"
That was my first approach altho I put in try...catch block. The logical addressing is good...if could fold into an anonymous function somehow--have to mull that over.

サインインしてコメントする。


dpb
dpb 2018 年 10 月 17 日
編集済み: dpb 2018 年 10 月 17 日
OK, thanks to Peter for triggering the idea on how to add the missing hour substring dynamically! :)
>> et=cellfun(@(s) duration(sscanf([repmat('00:',sum(s==':')==1) s],'%d:').','Format','hh:mm:ss'),raceTime)
et =
1×4 duration array
00:28:44 00:54:08 01:02:34 01:58:33
>>
I am using R2017b so duration is still limited to the three numeric inputs; it doesn't accept the time string form. Not sure which release had the enhancement; but if one has that then can remove the call to sscanf and parse the augmented string directly--
cellfun(@(s) duration([repmat('00:',sum(s==':')==1) s]),'Format','hh:mm:ss'),raceTime,'uni',0)
Seems like it wouldn't be too much of a stretch to let leading missing field(s) be implied zeros automagically...
  5 件のコメント
the cyclist
the cyclist 2018 年 10 月 17 日
編集済み: the cyclist 2018 年 10 月 17 日
I'm not going to do exhaustive time-testing. My real data is only about 1,000 elements, so time is not that big a deal. But I ran
  • Peter's two-step duration code
  • my own "template" code
  • this one-liner
  • Stephen's sscanf code
one thousand times each. The results were roughly
  • 1 second
  • 6 seconds
  • 4 seconds
  • 7 seconds
dpb
dpb 2018 年 10 月 17 日
That's a good catch to pull duration out of the cellfun, cyclist.
Interesting the relative poor showing of sscanf; as so often the case, sometimes what we think is a bottleneck may turn out not to be or vice versa...by the rank of Peter's, I'd guess probably the try...catch loop would fare pretty well as well altho I hadn't tried any timings was mostly just playing "golf" to see if could get it down to the one-liner as entertainment! :)

サインインしてコメントする。


the cyclist
the cyclist 2018 年 10 月 17 日
編集済み: the cyclist 2018 年 10 月 17 日
Oh, dear. I just found a terrible, wonderful obfuscated solution:
digitArray = char(raceTime)-'0';
idx = digitArray(:,end)==-16;
digitArray(idx,:) = [zeros(sum(idx),2), digitArray(idx,1:end-2)];
durationInMinutes = digitArray(:,1)*60 + digitArray(:,3)*10 + digitArray(:,4) + digitArray(:,6)/6 + digitArray(:,7)/60;
The same thousand iterations of this takes only 0.07 seconds.

カテゴリ

Help Center および File ExchangeCharacters and Strings についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by