Read specific rows from a large .csv

10 ビュー (過去 30 日間)
Lorenzo
Lorenzo 2016 年 7 月 6 日
コメント済み: Steven Hunsinger 2022 年 9 月 14 日
Hi,
I try to find a solution, which computes fast, to handle a big .csv (35MB). Good part is I only a certain part of the file. Basically I would like to read only rows which start with a certain name.
Unfortunately the file is composed like this:
Varname_1 timestring(t=0) valueX valueY
Varname_2 timestring(t=0) valueX valueY
...
Varname_n timestring(t=0) valueX valueY
Varname_1 timestring(t=1) valueX valueY
Varname_2 timestring(t=1) valueX valueY
...
Varname_n timestring(t=1) valueX valueY
...
... and so on
My idea would be to read the .csv-file line by line check for Varname = Varname1 i.e. and write it to an cellarray (or 4 vectors) like this:
Varname_1 timestring(t=0) valueX valueY
Varname_1 timestring(t=1) valueX valueY
Varname_1 timestring(t=2) valueX valueY
...
Any idea for a smart code? Thank You! (add. notes: varname = string, time = string, value = number with , separated decimal)
------------------------------------ EDIT: example data
output would be i.e.
var2 10:10:10 16,1010138923
var2 10:10:20 89,1560542863
var2 10:10:30 69,557621819
var2 10:10:40 9,9246195517
  3 件のコメント
Lorenzo
Lorenzo 2016 年 7 月 6 日
Sorry! Means the decimal delimiter is not a point. Its a comma. Example: 12,34 instead of 12.34
dpb
dpb 2016 年 7 月 6 日
That, I think, you'll have to fixup outside Matlab; don't think it knows how to handle it?? If it's csv separated, that's a problem for certain.

サインインしてコメントする。

採用された回答

Image Analyst
Image Analyst 2016 年 7 月 6 日
Use readtable() and then search column 1 for the filename pattern you want. Attach a small example with wanted and unwanted filenames if you can't figure it out.

その他の回答 (2 件)

dpb
dpb 2016 年 7 月 6 日
編集済み: dpb 2016 年 7 月 6 日
Untested, but check that the pattern matching format string doesn't solve the problem directly...
vName='Varname_1'; % the variable name you're looking for
fmt=[vName '%s %f %f']; % match vName, string, two numerics
fid=fopen('yourbigfile.csv','r');
data=textscan(fid,fmt,'delimiter',',');
fid=fclose(fid);
As said I'm not positive, but I think there's at least a reasonable chance the pattern-matching will do what you're looking for. Worth a shot methinks...
Well, doggonit, magic doesn't happen, joy didn't ensue... :(
But, the original idea isn't difficult...
while ~feof(fid)
l=fgetl(fid);
if strfind(l,vName)
data{i}=textscan(l,fmt);
end
end
fid=fclose(fid);
worked for a sample file albeit I used space-delimited and '.' as the decimal indicator; I think that'll still be a problem.
I thought
while ~feof(fid)
try
data{i}=textscan(l,fmt);
catch
end
end
fid=fclose(fid);
would work around the issue but it didn't; textscan simply gave up and quit reading anything once if failed; it doesn't throw an error, it just throws up its hands silently. :(
  3 件のコメント
dpb
dpb 2016 年 7 月 6 日
I used textscan not csvread, IA???
He's also got comma as the decimal indicator and says he's got a .csv file in which case it's indeterminable--which comma is a delimiter and which is a decimal point?
Image Analyst
Image Analyst 2016 年 7 月 6 日
Oh, sorry - I didn't notice.

サインインしてコメントする。


Lorenzo
Lorenzo 2016 年 7 月 6 日
Got it. readtable() works lightning fast. This is my approach:
1) overwrite , with . as decimal delimiter(not necessary but I need the values as numbers for postprocessing)
2) readtable
comma2point_overwrite('bigdata.csv')
T = readtable('bigdata.csv', 'Delimiter', ';');
T2 = T(find(strcmp('Durchflussmessung-H2-163bar_real', T{:,1})),:)
clearvars T;
where comma2point_overwrite() is:
function comma2point_overwrite( filespec )
% replaces all occurences of comma (",") with point (".") in a text-file.
% Note that the file is overwritten, which is the price for high speed.
file = memmapfile( filespec, 'writable', true );
comma = uint8(',');
point = uint8('.');
file.Data( transpose( file.Data==comma) ) = point;
end
Thanks for Your Help!!
  1 件のコメント
Steven Hunsinger
Steven Hunsinger 2022 年 9 月 14 日
Not so lightning fast if you get your company network involved. 67.5MB with a breakpoint after readtable. 10 minutes. This might be OK if I need all that data loaded into RAM, but seems excessive for reading the first line or so. Is there a better way?

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeWorkspace Variables and MAT Files についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by