Combine three columns with comma for a lot CSV files

9 ビュー (過去 30 日間)
Ehsan Shooshtari
Ehsan Shooshtari 2020 年 7 月 23 日
コメント済み: Ehsan Shooshtari 2020 年 8 月 2 日
Thank you Friends
I am working to write a script which it needs to open the folder of my CSV files, more than 1000 files, combine three spacific columns and prepare an xls file as result. Every CSV, 16* 15000 (raw, Col) with same headers after combine and it require to comes up with an xls file 1000 raws and the combine column.(1000*1)(a raw for every CSV).
(I prepared three simple CSV, folder of Examps, which I attach here to exam the script on them and then I need to deal with all 1000 CSV.
Up to now after I import the CSV in Matlab but I couldn't ask to find specific columns for combine.)
clc
close all
fileDir = 'D:\Examps';
outfile = 'D:\XYZ_1.1000.csv';
addpath(fileDir);
fileNames = dir(fileDir);
fileNames = {fileNames.name};
fileNames = fileNames(cellfun(...
@(f)contains(f,'.csv'),fileNames));
%x = xlsread(filenames);
for f = 1:numel(fileNames)
[~, ~, raw] = xlsread(fullfile(fileDir, fileNames{f}))
xlswrite(outfile, raw, fileNames{f})
end.
*******************The rest is my effort to handle it without success*********************
((((((((((((((((((((((((((fid = fopen('.csv');
g = tableread(fid,'%s ' , 'delimiter','\n');
fclose(fid)
c = g{1,1}(10:16);
c = str2num(cell2mat(c));
c = c(:,3); %reads third column
%combine = cat(:,'D','E','F')
%c = f{1,1}(10:16)
%c = c(:,4) %reads third column
% X = f(:,1)
% out = strcat(X,E,F)
%[~,~, col] = xlsread (combine)
%result = fileNames(:,4);
end
%fileNamesTemp = fileNames(:,[4:6])
%for f = 1:numel(fileNames)
%[~, ~, raw] = xlsread( fullfile(fileDir, fileNames{f}));))))))))))))))))))))

採用された回答

dpb
dpb 2020 年 7 月 23 日
編集済み: dpb 2020 年 7 月 24 日
fileDir = 'D:\Examps';
d=dir(fullfile(fileDir,'*.csv');
for i = 1:numel(d)
data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
data=data(:,3);
end
OK, the above will read all your .csv files in the directory and leave you with the thrid column for each in turn.
I couldn't decipher the desired output form -- one long column or an array of however many columns there are files...?
That controls how to do the write or catenation after reading...can finish up when get clarification of the desired output.
  18 件のコメント
Ehsan Shooshtari
Ehsan Shooshtari 2020 年 7 月 29 日
Hello Dear dpb
Thanks for your time
I solved the order of XYZ, every x atech to the y and z of the point, just I have two problems.
  1. The output just store in one column, not a seperated column for every csv.
  2. The result for first csv is completly correct but for second csv, start with showing first csv and following by second, and for third , show the whole result.
clc
close all
fileDir = 'D:\Examps'; % the target data directory
outfile = 'D:\XYZ1000.csv';
fid=fopen(outfile,'w');
d=dir(fullfile(fileDir,'*.csv')); % dir() struct of *.csv files in directory
N=numel(d); % total number files found
XYZ=cell(1,N); % allocate cell array of Nx1 elements
for i = 1:N % iterate over them...
data=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
XYZ(i)={data(:,1:3).'}; % select the three columns only, put in XYZ
fmt=[repmat('%f,',1,numel(XYZ)-1) '%f\n'];
a=XYZ; % save the output to a variable
fprintf(fid,'%f\n',a{:});
end
fid=fclose(fid);
dpb
dpb 2020 年 7 月 29 日
Of course, on both...
  1. you converted from an array to a cell array so the repeat count in the format string is that array size, not the size of the data in the cell array total, and
  2. you're saving the elements of the cell array each pass through the loop so every loop has one more cell containing somthing -- but you write the entire cell array every time by using a{:} (and there's no need for the temporary variable, anyway)
It's still not clearly defined just how you want things written; again, you built a cell array to keep data together that belongs together; if you then try to write this out on a record basis you're going to destroy all that. The example you gave had, I think I recall, some four sets of 3 coordinates--if you string that together you'll have 12 elements on a record. But if you string all 1000 or so files together, then that would end up as being some 12,000 element-long record that makes no sense.
Why not build the XYZ cell array and then just SAVE() it and LOAD() it again by whatever subsequently needs it?
CSV files are very inefficient and bulky and don't carry full precision unless explicitly write like 15 decimal digts which really adds to the bulk.
As noted from the git-go, the needs here are just not at all clear as to what it is that is the needed end result but one has to guess it's not an optimal way to go about achieving whatever that end objective is.

サインインしてコメントする。

その他の回答 (1 件)

Ehsan Shooshtari
Ehsan Shooshtari 2020 年 7 月 31 日
Hello
Dear MR. dpb
Thank you for response
I was figured out, you was right about XYZ order and I came back to previous arrange. However, I meet two more problems.
First, The below program repeat result for example after write the XYZ again in the same row repeat the numer without change.
Second, I ran this script with actual data finally comes up with a bulky exell 6.5 gig and my excell can not open it.
..............................................
clc
close all
fileDir = 'D:\Examps'; % the target data directory
outfile = 'D:\XYZ_12000.csv';
fid=fopen(outfile,'w');
d=dir(fullfile(fileDir,'*.csv')); % dir() struct of *.csv files in directory
N=numel(d); % total number files found
for i = 1:N % iterate over them...
XYZ=readmatrix(fullfile(fileDir,d(i).name),'NumHeaderLines',1);
data=XYZ(:,1:3) % select the three columns only
fmt=[repmat('%f,',1,numel(XYZ)-1) '%f\n'];
fprintf(fid,fmt,XYZ);
end
fid=fclose(fid);
Best
  2 件のコメント
dpb
dpb 2020 年 7 月 31 日
"with actual data finally comes up with a bulky exell 6.5 gig"
That's what I told you from the beginning trying to do what you're doing in using csv text files would create very large files...
I suggested two possible workarounds--
  1. Process each file as it is read instead of trying to merge them all at once. Most algorithms are sequential so is at least moderately likely can do so if think through what is that is the end result needed;
  2. If, indeed, it is mandatory to have all data at one time, use .mat files to save the intermediate and load into what code needs the result.
If must for some unfortunate reason use Excel you'll simply have to cut the size down to what it can handle one way or the other...
Ehsan Shooshtari
Ehsan Shooshtari 2020 年 8 月 2 日
Appreciate

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeLow-Level File I/O についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by