Fastest way to search files by pattern name
42 ビュー (過去 30 日間)
古いコメントを表示
I have a main folder with a lot of subfolders (thousands). I want to load files from only specific subfolders, that can be found by specific pattern in the subfolder name. Then, in each of the subfolders, there are tens of sub-subfolders, where I also have to go to only specific ones, which again can be found by a pattern in the name. To extract needed files, I have implemented two ways of doing this via dir function: 1) one line, just using the whole path with subfolders and sub-subfolders; 2) firstly, searching for all subfolders and then searching for sub-subfolders in a for loop over the subfolders. Turns out, that the latter is much faster. Could you explain why?
%first way
files = dir(fullfile(main_folder,'*_data/*_file_to_load/file1.mat'));
%second way
subfolders = dir(fullfile(main_folder,'*_data/');
files = cell(1,numel(subfolders));
for i = 1:numel(subfolders)
files{i} = dir(fullfile(subfolders(i).folder,subfolders(i).name,'*_file_to_load/file1.mat'));
end
6 件のコメント
Image Analyst
2023 年 4 月 16 日
@Anton Baranikov did you overlook the Answer below in the official Answer section of the page? Did you only see the comments up here at the top where people are not giving answers but are asking for clarification of the question? If you saw my Answer below, then explain why it doesn't work, or let me know that it did work.
採用された回答
dpb
2023 年 4 月 17 日
編集済み: dpb
2023 年 4 月 17 日
As far as the original Q?, it's owing to how the underlying OS processes the dir command -- when you ask for a directory listing of a chain of subdirectories from a higher level, those aren't necessarily stored in sequence on disk in the pattern in which they appear so the dir command has to traverse the whole directory structure from the top until it gets all the way to the bottom; it also doesn't know where the match may stop so it has to do everything possibly reacheable from the very topmost location.
In the second case, you're giving it the starting point underneath the specific folder and that chain to the bottom is undoubtedly only one level deep. It's just not doing nearly as much work in the second case as must do in the first.
The fastest way will be to limit the search to as shallow a depth search as your a priori knowledge of the structure can make it. More shallow searches will virtually always beat one deep one.
2 件のコメント
dpb
2023 年 4 月 17 日
You'll trade some coding complexity/thinking about the actual data structure for better performance this way. The one time investment may well pay off in the long run if it's a case that will occur often; particularly if you can also automate the generation of the order structure programmatically.
その他の回答 (1 件)
Image Analyst
2023 年 4 月 16 日
Use contains to see if the pattern is in the folder or file name. Process the ones you want, and skip the ones you don't want by calling continue
if contains(thisSubFolderName, 'patternIDoNotWant')
continue % Skip to bottom of for loop
end
4 件のコメント
dpb
2023 年 4 月 17 日
編集済み: dpb
2023 年 4 月 17 日
Actually, contains (and friends) work same...
if contains(thisSubFolderName, 'patternIWant1') || contains(thisSubFolderName, 'patternIWant3') || contains(thisSubFolderName, 'patternIWant3')
could be written as
if contains(thisSubFolderName, {'patternIWant1','patternIWant2','patternIWant3'})
Have to be careful with contains however, that it is the comparison wanted because it matches any substring within the searched string.
参考
カテゴリ
Help Center および File Exchange で File Operations についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!