Most efficient method to search through file names?

10 ビュー (過去 30 日間)
S G
S G 2019 年 6 月 5 日
コメント済み: S G 2019 年 6 月 11 日
I have a large number of files that all have file name formats that are of the form
SSSTTTTMMYY
Where the 'encoding' of the file name breaks down into something like this:
SSS - Three letter code referencing a location (locations that I know and have a MATLAB table that relates these codes to a location name)
TTTT - That represents the 'type' of data that we have captured (also values we already have)
MMYY - Is simply the month and year that data was taken.
So for example, we may have something like:
LDNACPD0618
where LDN = London, ACPD = Average captured pollution data, 0618 = June, 2018.
So here is the actual question:
I want to build a function that can search through these file names that can search based on:
  • Search based on choice of site location e.g. All data from site LDN
  • Get all files between a number of dates e.g. Select all data between 0118 - 0318
  • Search based on choice of 'type' of data e.g. All data that is ACPD
  • Or a combination of the above e.g. All data from LDN between 0118 - 0318
What is the most efficient way to do this other than making three separate functions to check each section of the file name? Would something like a regular expression work?
Many thanks for your help and advice in advance!
  1 件のコメント
Stephen23
Stephen23 2019 年 6 月 5 日
編集済み: Stephen23 2019 年 6 月 5 日
"Would something like a regular expression work?"
Matching the SSS and TTTT parts would not be too difficult, but matching a range of dates really requires converting to date (e.g. date number or datetime) and then doing a logical comparison.
Start by splitting the names up (e.g. using regexp or indexing) and then:
  1. comapre SSS using strcmp
  2. compare TTTT using strcmp
  3. convert MMYY to datetime and compare using logical comparisons.

サインインしてコメントする。

採用された回答

Guillaume
Guillaume 2019 年 6 月 10 日
I would create a function that parses the filenames and split them into variables of a table (or even a timetable). Then it's trivial to use matlab comparison functions to get the relevant rows of the table:
function filedetails = getdetails(filelist)
%filelist: a structure array such as returned by dir
filenames = vertcat(filelist.name); %convert the list of filename into a char array
assert(size(filenames, 2) == 11, 'Length of file names not consistent with pattern')
%temporary variables to create the table. Using string to make comparison easier
Location = string(filenames(:, 1:3));
Type = string(filenames(:, 4:7));
Date = datetime(filenames(:, 8:11), 'InputFormat', 'MMyy');
Filename = string({filelist.name}');
%create the table
filedetails = table(Location, Type, Date, Filename);
end
Now it's easy to filter the table according to whichever pattern you want:
filelist = dir('C:\somewhere\*.txt'); %whichever way you obtain your file list
filedetails = getdetails(filelist);
%Get all filenames matching Location XYZ
selectedfiles = filedetails.Filename(filedetails.Location == 'XYZ');
%Get all filenames matching Location XYZ or UVW
selectedfiles = filedetails.Filename(ismember(filedetails.Location, {'XYZ', 'UVW'});
%or
selectedfiles = filedetails.Filename(filedetails.Location == 'XYZ' | filedetails.Location == 'UVW');
%Get all filenames matching Type ABCD and location XYZ
selectedfiles = filedetails.Filename(filedetails.Type == 'ABCD' & filedetails.Location == 'XYZ');
%Get all filenames between January and March 2018
selectediles = filedetails.Filename(isbetween(filedetails.Date, datetime(2018, 1, 1), datetime(2018, 3, 31)));
Note the use of isbetween to get files between a date range.
  1 件のコメント
S G
S G 2019 年 6 月 11 日
I'm in favour of this 'isbetween' idea. I would prefer to treat the MMYY section of the file name as a datetime data type. Thanks!

サインインしてコメントする。

その他の回答 (1 件)

Jan
Jan 2019 年 6 月 5 日
編集済み: Jan 2019 年 6 月 5 日
Folder = 'D:\Your\Folder';
FileList = dir(fullfile(Folder, '*.*'));
NameList = {FileList.name};
% NameList = {'SSSTTTT0617', 'SSSTTTT0631', 'WWWQQQQ0724'}
Data.Location = cellfun(@(s) s(1:3), a, 'UniformOutput', 0);
Data.Type = cellfun(@(s) s(4:7), a, 'UniformOutput', 0);
Data.Date = cellfun(@(s) sscanf(s(8:11), '%d'), a, 'UniformOutput', 1);
% Data which have the Location = 'SSS':
Match = FindData(Data, 'Location', 'SSS')
% Data which have the Location = 'SSS' and the date 0631:
Match = FindData(Data, 'Location', 'SSS', 'Date', 631)
% Data which have the Type 'TTTT' a date between 0631 and 0801:
Match = FindData(Data, 'Type', 'TTTT', 'DateRange', [631, 801])
... etc
function Match = FindData(Data, varargin)
Match = true(size(Data));
for k = 1:2:numel(varargin)
switch lower(varargin{k})
case 'location'
Match = Match & strcmp(Data.Location, varargin{k+1});
case 'type'
Match = Match & strcmp(Data.Type, varargin{k+1});
case 'date'
Match = Match & (Data.Date == varargin{k+1});
case 'daterange'
Match = Match & (Data.Date >= varargin{k+1}(1) & ...
Data.Date <= varargin{k+1}(2));
otherwise
error('Unknown job: %s', varargin{k})
end
end
% Maybe:
% Match = find(Match);
end
  3 件のコメント
Jan
Jan 2019 年 6 月 10 日
Data is a struct with three fields, which contains the arrays of the different parts of the data. If you provide some real test data, a more matching answer is possible. I've guessed, that the file names can be obtained by dir in a specific folder. This was my best guess for this explanation:
I have a large number of files that all have file name formats that are of the form SSSTTTTMMYY
S G
S G 2019 年 6 月 10 日
So the files names are stored in a string array of size nx1. So I would parse this array into the anonymous function inside Data in your example?

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeCell Arrays についてさらに検索

製品


リリース

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by