MATLAB Answers

Remove rows/text at the bottom of a csv file

20 ビュー (過去 30 日間)
Damith
Damith 2015 年 11 月 13 日
コメント済み: Tunechi 2021 年 5 月 8 日
Hi,
I have over 2000 csv files and I can read the csv files and store in a cell array. But, all csv files has some text written at the end of the rows (the text is same in all files). How can I delete the text from all files.
Please see the images below. My MATLAB code is shown below.
clear all
cd ('C:\Users\Desktop\')
myFolder = 'C:\Users\Desktop\Q_gte_10';
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
uiwait(warndlg(errorMessage));
return;
end
filePattern = fullfile(myFolder, '*.csv');
csvFiles = dir(filePattern);
for k = 1:length(csvFiles)
fid(k) = fopen(fullfile(myFolder,csvFiles(k).name));
out{k} = textscan(fid(k),'%s %s %s %s %*[^\n]','delimiter',',','headerlines',1);
fclose(fid(k));
end

採用された回答

dpb
dpb 2015 年 11 月 13 日
Use
find(strcmp(out{1},'DISCLAIMER'))
to find the location in the cell array where the DISCLAIMER is located and then delete all lines following. Note you'll have to address the CONTENT of the cell array to get the cell string content.
  9 件のコメント
dpb
dpb 2015 年 11 月 17 日
PS: Looks like you should retain at least the month, too, however; there are multiple data points in a given year in some cases that are thus aliased keeping only the year.

サインインしてコメントする。

その他の回答 (2 件)

Image Analyst
Image Analyst 2015 年 11 月 17 日
I'd simply use fgetl(), strfind() and fprintf(), something like
fid = fopen('foo.csv');
fOutput = fopen('outFoo.csv');
tline = fgetl(fid);
while ischar(tline)
disp(tline)
tline = fgetl(fid);
if ~isempty(strfind(tline, 'DISCLAIMER'))
break;
end
fprintf(fOutput, '%s\n', tline);
end
fclose(fid);
fclose(fOutput);
% If you want back in the same file
delete('foo.csv'); % Delete old/input file
movefile('outFoo.csv', 'foo.csv'); % Rename file.
  1 件のコメント
Tunechi
Tunechi 2021 年 5 月 8 日
Thanks for this @Image Analyst

サインインしてコメントする。


dpb
dpb 2015 年 11 月 17 日
編集済み: dpb 2015 年 11 月 18 日
OK, with the depth of the conversation under the original and that now have access to real data file I'm moving the last comment previous and turning it into "that's my answer and I'm stickin' to it!" :)
You can apply the following to either the cleaned-up versions you attached or to the originals--
fmt='%*s %d %4d-%*2d-%*2d %d %*[^\n]';
for i=1:length(d)
fid=fopen(d(i).name);
c=cell2mat(textscan(fid,fmt,'headerlines',68,'collectoutput',1,'delimiter','\t'));
c(all(c==0,2),:)=[]
[path,name,ext]=fileparts(d(i).name);
csvwrite([fullfile(path,name) '.csv'],c)
fid=fclose(fid);
end
will leave you with a csv file of the same name containing just the above pieces of data for each with the same root name as the original.
With textscan it will abort automagically at the first non-matching line after the data of interest which will clean up the input much more easily than your current gyrations.
The 'collectoutput' argument returns the values in a single cell array and forces an empty cell into all columns; otherwise the first two cell arrays will end up with a zero while the last doesn't owing to the behavior on (the expected) error when hits the trailing text.
cell2mat turns it into an "ordinary" double array instead of cell array so indexing is simpler and since there's no need for mixed types here it's much easier (and faster and less memory intensive to boot). Then the last fixup simply removes that line of all zeros making sure if there are zero flow data values (unlikely, yes, but...) don't remove any actual data by the check the whole row is 0.
NB: The tab delimiter is mandatory to account for the missing/empty fields in some files; otherwise by default it'll fail with one of the characters being read where a numeric value is expected. If you chose, you could use the 'EmptyValue' field and return NaN instead of zero to make it obvious where this is occurring.
NB 2: Ran the above on the full directory to make sure nothing unexpected occurred. Looks ok other than the fact that there are sometimes multiple readings in a given year so that it would appear should keep the month as well to avoid aliasing.
  15 件のコメント
Damith
Damith 2015 年 12 月 2 日
So sorry for being late to comment on this. Was sick really bad. I had a a look at this and this seems to be working fine. I checked the outputs and it looks identical to the original file. Thanks so much again.

サインインしてコメントする。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by