Create an array from file names to find if any files are missing

21 ビュー (過去 30 日間)
Cameron
Cameron 2022 年 10 月 7 日
コメント済み: Cameron 2022 年 10 月 10 日
I work with large data sets and it can be difficult to see if you are missing any data files in a folder. The filenames are rather large but all have a data number within the filename (e.g. dxcvc_2020-05-05_15-34-17_xx_0001 dsffdsfs.csv).
The 0001 is the data file number increases in increments of 1 every file. I want to scan a folder and create an array of all the data file numbers to ensure that all of the data files from 0001 to 00xx are contained in that folder and read back to me the nubmers that are missing if there are any.
I basically want to create two arrays, one user generated and ahve the script spit out the differences between the two
  1 件のコメント
Mario Malic
Mario Malic 2022 年 10 月 7 日
Hey,
Use dir function to get the structure containing all the files within current folder or specify it. Use sscanf on the field containing the name of the file getting the integer you are looking for and save it in an array.
From here, you can figure it out on your own with ismember. Find maximum number, generate a vector of increasing numbers and use the mentioned function to get the missing files.

サインインしてコメントする。

回答 (2 件)

Mario Malic
Mario Malic 2022 年 10 月 7 日
Hey,
I adjusted Davide's (thanks) code for a little bit more robust solution.
clc; clear;
files = dir('*.*'); % structure with all the info of files in directory
filenames = {files.name}'; % extracts file names
filenames = filenames(3:end); % removes '.' and '..'
filesSplitCell = cellfun(@(x)strsplit(x, {'_', '.'}), filenames, 'UniformOutput',false);
filesSplitCellNew = [filesSplitCell{:}];
foundNumbers = cellfun(@str2num, {filesSplitCellNew{5:6:end}}); % number needed starts from 5th cell
maxNum = max(foundNumbers);
compareVec = 1:maxNum;
missingfiles = ~ismember(compareVec, foundNumbers);
compareVec(missingfiles)
  1 件のコメント
Cameron
Cameron 2022 年 10 月 10 日
This worked for me after changing a few varialbes and adding a start number as opposed to assuming it start at 1. Thank you!

サインインしてコメントする。


Davide Masiello
Davide Masiello 2022 年 10 月 7 日
編集済み: Davide Masiello 2022 年 10 月 7 日
Take this example, where there should be a total of 4 files but file #3 is missing
n = 4; % number of expected files
files = dir('*.*'); % structure with all the info of files in directory
filenames = {files.name}'; % extracts file names
filenames = filenames(3:end) % removes '.' and '..'
filenames = 3×1 cell array
{'dxcvc_2020-05-05_15-34-17_xx_0001.csv'} {'dxcvc_2020-05-05_15-34-17_xx_0002.csv'} {'dxcvc_2020-05-05_15-34-17_xx_0004.csv'}
filenumbers = cellfun(@(x)str2num(x(30:33)),filenames)
filenumbers = 3×1
1 2 4
missingfile = find(~any(filenumbers==1:n,1))
missingfile = 3
To be noted: this will work only if all file names have same relative positioning of the number, i.e. between the 30th and 33rd characters.
  1 件のコメント
Cameron
Cameron 2022 年 10 月 10 日
This one works, but seems to assume that the files start at zero

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeFile Operations についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by