Find continuous file name jump

Question

Tsuwei Tan 2018 年 5 月 9 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/399829-find-continuous-file-name-jump

コメント済み: Tsuwei Tan 2018 年 5 月 9 日

I have dozen thousand files with names like the following:

SHARK_225054651_41_0547_r001
SHARK_225054651_41_0548_r005
SHARK_225054651_41_0548_r009
...
SHARK_225054651_41_0619_r121
SHARK_225054651_41_0620_r125
...
SHARK_225062101_41_0621_r001
SHARK_225062101_41_0621_r005
SHARK_225062101_41_0622_r009
...
SHARK_225062101_41_0653_r121
SHARK_225062101_41_0654_r125

each file's name end up with .....r%%%, the three %%% digits are 001, 005,....up to 121, 125. Total thirty-two with increment equals four and the same SHARK_%%%%%%%%%_%%_%%%%_r%% name before _r%%%. Then another file starts over with the same r%%% iteration.

However, the actual file has a "jump" for instance r005 is missing between r001 and r009.

Is there a way to read out the file name with some logic loop to pick up the missing one?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Stephen23 2018 年 5 月 9 日

2
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/399829-find-continuous-file-name-jump#answer_319370

編集済み: Stephen23 2018 年 5 月 9 日

MATLAB Online で開く

C = { % fake data:
'SHARK_225054651_41_0548_r001'
'SHARK_225054651_41_0548_r005'
'SHARK_225054651_41_0548_r009'
'SHARK_225054651_41_0548_r021'
'SHARK_225054651_41_0548_r025'
'SHARK_225062101_41_0621_r005'
'SHARK_225062101_41_0621_r009'
'SHARK_225062101_41_0621_r013'
'SHARK_225062101_41_0621_r025'
};
T = regexp(C,'^(\w+)r(\d{3})$','tokens','once'); % split
A = cellfun(@(c)c(1),T); % 1st token
B = cellfun(@(c)c(2),T); % 2nd token
[U,~,X] = unique(A(:));  % 1st token unique only
V = str2double(B(:));    % 2nd token -> numeric values
G = 1:4:25; % the required values (change this to 1:4:125).
C = accumarray(X,V,[],@(v){setdiff(G,v)});

The outputs of interest to you are:

U the unique groups, e.g. 'SHARK_225054651_41_0547_' and 'SHARK_225062101_41_0621_' in my example fake data.
C the missing rXXX values.

These outputs are shown here:

>> U{1}
ans = SHARK_225054651_41_0548_
>> C{1}
ans =
   13   17
>> U{2}
ans = SHARK_225062101_41_0621_
>> C{2}
ans =
    1   17   21

You could easily loop over these, or display them in the command windows or your GUI, etc:

>> Z = [U,cellfun(@num2str,C,'uni',0)]';
>> for k = 1:numel(U), fprintf('%s:  %s\n',U{k},sprintf('%3d, ',C{k})); end
SHARK_225054651_41_0548_:   13,  17,
SHARK_225062101_41_0621_:    1,  17,  21,