フィルターのクリア

Use terminal to speed up file removal

1 回表示 (過去 30 日間)
Pete
Pete 2017 年 10 月 17 日
回答済み: Stephen23 2017 年 10 月 17 日
Hi all, I've got large number of CSVs generated each time a system changes state. Basically, the CSVs start as a single row [1x3] array, and any data is added as a new row. I've written simple loop that checks for any "empty" CSVs (only containing the single row) and remove this file. This however takes many (>10) minutes to complete and I want to try the same in terminal. Code as shown:
CSV_Filenames_STRUCT = dir(sprintf('%s/*.csv',ResultDirectory));
CSV_Filenames_CELL = {CSV_Filenames_STRUCT.name};
StartingNumberOfFiles = size(CSV_Filenames_CELL,2);
for NthFile = 1:StartingNumberOfFiles
NumberOfPeaks = size(textread(sprintf('%s/%s',ResultDirectory,CSV_Filenames_CELL{1,NthFile}),'%s'),1) - 1; % Number of rows less one for the 'x,y,value'
if ~NumberOfPeaks % Essentially empty
delete(sprintf('%s/%s',ResultDirectory,CSV_Filenames_CELL{1,NthFile}));
end
end
I've not used terminal much, and wondering if it'd be faster for the above when there are many files to process, and how to code the check for the single line check So far, I've got something like:
for f in *.csv;
do
L=`wc -l "$f" | awk '{print $1}'`
if test $L -eq 1
then
mv $f ./MT;
fi
done
which isn't quite working (there's spaces in the filename as shown below), but I'm out of my depth here so calling for help on how to use the "system"/"unix" options through Matlab. I'm running OS-X and Kubuntu Linux. I should also mention that the filenames have spaces in them like: "Filter 0000001 Fwd,Alignment Black Screen - Ref_01 Input_19 (2017-10-17 @ 13.30.20.103).csv"
  3 件のコメント
Pete
Pete 2017 年 10 月 17 日
Just started a set with 2,000,000 files, but only expect about 10% of these to have genuine results (200k), so the rest just 'empty' CSVs (one row of (title) data). Looking at profiler, I think the Matlab functions called from textread are possibly taking time. I've removed sprintf's and replaced with concatenation strings i.e. [PathPart1 '/' PathPart2] etc. Sped up a bit, but still a long time for processing. Any other suggestions?
Jan
Jan 2017 年 10 月 17 日
You mean "shell", not "terminal".

サインインしてコメントする。

回答 (2 件)

Jan
Jan 2017 年 10 月 17 日
I'm not sure if I understand your question correctly: You want to delete all files, which have one column only - correct?
FULLFILE is smarter than creating file names by sprintf().
CSV_Filenames_STRUCT = dir(fullfile(ResultDirectory, '*.csv'));
CSV_Filenames_CELL = {CSV_Filenames_STRUCT.name};
StartingNumberOfFiles = numel(CSV_Filenames_CELL);
for NthFile = 1:StartingNumberOfFiles
File = fullfile(ResultDirectory, CSV_Filenames_CELL{NthFile});
fid = fopen(File, 'r');
if fid == -1, error('Cannot open file: %s', File); end
line1 = fgetl(fid);
line2 = fgetl(fid);
fclose(fid);
if ~ischar(line2)
delete(File);
end
end
Is this faster? It tries to import 2 lines only.

Stephen23
Stephen23 2017 年 10 月 17 日
Remove the textread and replace it with something like this (pseudocode):
fid = fopen(...,'rt');
fgetl(fid); % read first row
if feof(fid) % check if end of file
delete(...)
end
"I've removed sprintf's and replaced with concatenation strings "
I would recommend using fullfile: it actually makes the intention clearer.

カテゴリ

Help Center および File ExchangeCharacters and Strings についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by