Searching through files for missing data

3 ビュー (過去 30 日間)
drb17135
drb17135 2021 年 2 月 4 日
コメント済み: Adam Danz 2021 年 2 月 11 日
Hi,
I have a set of 8000 files in the format of
YYYYMMDDHHMMSS
year/month/day/hour/minute/second
The files should increase by 5 minutes each time and I need to write a function that would check the file names are named in a logical way. And if any files are missing it can identify this for me.
The function strcmp and join have been reccomended to me.
Does anyone know how to do this?

採用された回答

KSSV
KSSV 2021 年 2 月 4 日
You may follow something like this:
files = dir('*txt') ; % give your extension
% Create a datetime vector for the files present with the names mentioned
[P,N,E] = cellfun(@fileparts,f,'UniformOutput',0) ;
t = datetime(N,'InputFormat','yyyyMMddHHmmSS') ; % this is datetime for the files present
%% Create 5 mins possible datetime vector
file1 = files(1).name ;
[path, name1, extension] = fileparts(file1) ;
t0 = datetime(name1,'InputFormat','yyyyMMddHHmmSS') ;
file2 = files(end).name ;
[path, name2, extension] = fileparts(file2) ;
t1 = datetime(name2,'InputFormat','yyyyMMddHHmmSS') ;
% Make datetime arrray
t0 = t0:minutes(5):t1 ; % this is used for comparison
% Get the indices which are present
idx = ismember(t0,t1) ;
% Dates which donot exist
t0(~idx)
  7 件のコメント
drb17135
drb17135 2021 年 2 月 5 日
Error using datetime (line 635)
Unable to convert 'T_PAAH72_C_EIDB_20200901000000' to datetime using the format 'yyyyMMddHHmmSS'.
Error in searchfiles (line 16)
t1 = datetime(name2,'InputFormat','yyyyMMddHHmmSS') ;
I think because the whole line is not in date time format this may be failing. Would you suggest I try parsing the line to leave me with the - 20200901000000 part?
Adam Danz
Adam Danz 2021 年 2 月 5 日
That's because this solution missing step 2 in my modifed list of 4 clear steps as I mentioned in a comment under this answer as well.

サインインしてコメントする。

その他の回答 (1 件)

Adam Danz
Adam Danz 2021 年 2 月 4 日
>Does anyone know how to do this?
Lots of people know how to do this and we're here to help but few people will devote a portion of their day to do it for you.
Let's start by figuring out where you're stuck. There are just a few basic steps in your process and you can find lots of information in this forum, on the web, and in the documentation for each step.
  1. Get a list of files. See dir()
  2. Read in the file. There are lots of ways to read files depending on the filetype and content (review).
  3. Are your time stamps in datetime format? If not convert them to datetime.
  4. If all you want to do is check whether a file is missing, you just need to store the following 3 data points for each file as 2 separate variables. This will be done in your loop: The first and last datetime value can be stored in an nx2 matrix for n files and the filename stored as an nx1 string array.
  5. Once all files are read and the 3 data points are stored for each file, you can sort the datetime values in case the files are read out of order and then compare the first datetime of file n with the last datetime from file n-1. If that difference is more than 5 minutes, you know you're missing a file and you can use the filename array to help identify which file is missing.
If you get stuck on any step leave a comment below and show us where you're at with the code and what the problem is.
  9 件のコメント
drb17135
drb17135 2021 年 2 月 11 日
I added this at the end. How do i make it add 5 minutes per each file?
t2 = datetime('01-Sep-2020 00:05:00','Format','yyyyMMddHHmmSS') + cumsum(minutes(5))
dt = between(t,t2)
Adam Danz
Adam Danz 2021 年 2 月 11 日
Looks like you're making progress.
1. In this line, you could add the file extension if it's the same for all files. That should list all of the files you need, assuming they are all in the same folder.
files = dir('C:\Users\drb17135\Documents\August_Radar\*.*')
files = dir('C:\Users\drb17135\Documents\August_Radar\*.hdf') % change to this
2. This is where things go wrong. "tmp" should be the "files" variable above. You don't need this line. Replace tmp with "files".
tmp=dir;
3. Instead of these 3 lines,
files = dir('C:\Users\drb17135\Documents\August_Radar\*.*')
file = myfile(5:5); %isolates the file in terms of 'yyyyMMddHHmmSS'
datetime(file,'InputFormat','yyyyMMddHHmmSS'); %gives name in format of '10-Sep-2020 04:00:00' - for example
Use these two, based on example: 'T_PAAH72_C_EIDB_19991231134501.hdf' (YYYYMMDDHHmmSS)
% >>files(idx).name
% ans =
% 'T_PAAH72_C_EIDB_19991231134501.hdf'
[~, timestamp] = regexp(files(idx).name, '([0-9]*).hdf','match','once','tokens');
% Which returns
% timestamp =
% {'19991231134501'}
timestampDT = datetime(timestamp{1},'InputFormat','yyyyMMddHHmmss')
% Which returns
% timestampDT =
% datetime
% 31-Dec-1999 13:45:01
4. Instead of assuming you have 8000 files, use the actual number of files identified to define the loop.
for idx = [1:8000] % Not this
for idx = 1:numel(files) % use this, where "files" is defined in my step #1 above.
5. Lastly, you need to store the datetime stamps in the loop so the loop should be structured like this (using variable names above).
timestampDT = nat(numel(files),1); % preallocate the loop variable
for idx = 1:numel(files)
% < PUT YOUR OTHER STUFF HERE >
% store all datetimes from the file names
timestampDT(idx) = datetime(timestamp{1},'InputFormat','yyyyMMddHHmmss');
end
Then you can differentiat
dt = diff(timestampDT)

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeStartup and Shutdown についてさらに検索

製品


リリース

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by