- comapre SSS using strcmp
- compare TTTT using strcmp
- convert MMYY to datetime and compare using logical comparisons.
Most efficient method to search through file names?
10 ビュー (過去 30 日間)
古いコメントを表示
I have a large number of files that all have file name formats that are of the form
SSSTTTTMMYY
Where the 'encoding' of the file name breaks down into something like this:
SSS - Three letter code referencing a location (locations that I know and have a MATLAB table that relates these codes to a location name)
TTTT - That represents the 'type' of data that we have captured (also values we already have)
MMYY - Is simply the month and year that data was taken.
So for example, we may have something like:
LDNACPD0618
where LDN = London, ACPD = Average captured pollution data, 0618 = June, 2018.
So here is the actual question:
I want to build a function that can search through these file names that can search based on:
- Search based on choice of site location e.g. All data from site LDN
- Get all files between a number of dates e.g. Select all data between 0118 - 0318
- Search based on choice of 'type' of data e.g. All data that is ACPD
- Or a combination of the above e.g. All data from LDN between 0118 - 0318
What is the most efficient way to do this other than making three separate functions to check each section of the file name? Would something like a regular expression work?
Many thanks for your help and advice in advance!
1 件のコメント
Stephen23
2019 年 6 月 5 日
編集済み: Stephen23
2019 年 6 月 5 日
"Would something like a regular expression work?"
Matching the SSS and TTTT parts would not be too difficult, but matching a range of dates really requires converting to date (e.g. date number or datetime) and then doing a logical comparison.
Start by splitting the names up (e.g. using regexp or indexing) and then:
採用された回答
Guillaume
2019 年 6 月 10 日
I would create a function that parses the filenames and split them into variables of a table (or even a timetable). Then it's trivial to use matlab comparison functions to get the relevant rows of the table:
function filedetails = getdetails(filelist)
%filelist: a structure array such as returned by dir
filenames = vertcat(filelist.name); %convert the list of filename into a char array
assert(size(filenames, 2) == 11, 'Length of file names not consistent with pattern')
%temporary variables to create the table. Using string to make comparison easier
Location = string(filenames(:, 1:3));
Type = string(filenames(:, 4:7));
Date = datetime(filenames(:, 8:11), 'InputFormat', 'MMyy');
Filename = string({filelist.name}');
%create the table
filedetails = table(Location, Type, Date, Filename);
end
Now it's easy to filter the table according to whichever pattern you want:
filelist = dir('C:\somewhere\*.txt'); %whichever way you obtain your file list
filedetails = getdetails(filelist);
%Get all filenames matching Location XYZ
selectedfiles = filedetails.Filename(filedetails.Location == 'XYZ');
%Get all filenames matching Location XYZ or UVW
selectedfiles = filedetails.Filename(ismember(filedetails.Location, {'XYZ', 'UVW'});
%or
selectedfiles = filedetails.Filename(filedetails.Location == 'XYZ' | filedetails.Location == 'UVW');
%Get all filenames matching Type ABCD and location XYZ
selectedfiles = filedetails.Filename(filedetails.Type == 'ABCD' & filedetails.Location == 'XYZ');
%Get all filenames between January and March 2018
selectediles = filedetails.Filename(isbetween(filedetails.Date, datetime(2018, 1, 1), datetime(2018, 3, 31)));
その他の回答 (1 件)
Jan
2019 年 6 月 5 日
編集済み: Jan
2019 年 6 月 5 日
Folder = 'D:\Your\Folder';
FileList = dir(fullfile(Folder, '*.*'));
NameList = {FileList.name};
% NameList = {'SSSTTTT0617', 'SSSTTTT0631', 'WWWQQQQ0724'}
Data.Location = cellfun(@(s) s(1:3), a, 'UniformOutput', 0);
Data.Type = cellfun(@(s) s(4:7), a, 'UniformOutput', 0);
Data.Date = cellfun(@(s) sscanf(s(8:11), '%d'), a, 'UniformOutput', 1);
% Data which have the Location = 'SSS':
Match = FindData(Data, 'Location', 'SSS')
% Data which have the Location = 'SSS' and the date 0631:
Match = FindData(Data, 'Location', 'SSS', 'Date', 631)
% Data which have the Type 'TTTT' a date between 0631 and 0801:
Match = FindData(Data, 'Type', 'TTTT', 'DateRange', [631, 801])
... etc
function Match = FindData(Data, varargin)
Match = true(size(Data));
for k = 1:2:numel(varargin)
switch lower(varargin{k})
case 'location'
Match = Match & strcmp(Data.Location, varargin{k+1});
case 'type'
Match = Match & strcmp(Data.Type, varargin{k+1});
case 'date'
Match = Match & (Data.Date == varargin{k+1});
case 'daterange'
Match = Match & (Data.Date >= varargin{k+1}(1) & ...
Data.Date <= varargin{k+1}(2));
otherwise
error('Unknown job: %s', varargin{k})
end
end
% Maybe:
% Match = find(Match);
end
3 件のコメント
Jan
2019 年 6 月 10 日
Data is a struct with three fields, which contains the arrays of the different parts of the data. If you provide some real test data, a more matching answer is possible. I've guessed, that the file names can be obtained by dir in a specific folder. This was my best guess for this explanation:
I have a large number of files that all have file name formats that are of the form SSSTTTTMMYY
参考
カテゴリ
Help Center および File Exchange で Cell Arrays についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!