Read specific rows from a large .csv

Question

0 投票

data.bsp.csv

Hi,

I try to find a solution, which computes fast, to handle a big .csv (35MB). Good part is I only a certain part of the file. Basically I would like to read only rows which start with a certain name.

Unfortunately the file is composed like this:

Varname_1   timestring(t=0)   valueX   valueY
Varname_2   timestring(t=0)   valueX   valueY
...
Varname_n   timestring(t=0)   valueX   valueY
Varname_1   timestring(t=1)   valueX   valueY
Varname_2   timestring(t=1)   valueX   valueY
...
Varname_n   timestring(t=1)   valueX   valueY
...
... and so on

My idea would be to read the .csv-file line by line check for Varname = Varname1 i.e. and write it to an cellarray (or 4 vectors) like this:

Varname_1   timestring(t=0)   valueX   valueY
Varname_1   timestring(t=1)   valueX   valueY
Varname_1   timestring(t=2)   valueX   valueY
...

Any idea for a smart code? Thank You! (add. notes: varname = string, time = string, value = number with , separated decimal)

------------------------------------ EDIT: example data

output would be i.e.

var2 10:10:10 16,1010138923

var2 10:10:20 89,1560542863

var2 10:10:30 69,557621819

var2 10:10:40 9,9246195517

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

Lorenzo 2016 年 7 月 6 日

Sorry! Means the decimal delimiter is not a point. Its a comma. Example: 12,34 instead of 12.34

dpb 2016 年 7 月 6 日

That, I think, you'll have to fixup outside Matlab; don't think it knows how to handle it?? If it's csv separated, that's a problem for certain.

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Image Analyst 2016 年 7 月 6 日

1 投票

Use readtable() and then search column 1 for the filename pattern you want. Attach a small example with wanted and unwanted filenames if you can't figure it out.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

dpb 2016 年 7 月 6 日

編集済み: dpb 2016 年 7 月 6 日

MATLAB Online で開く

1 投票

Untested, but check that the pattern matching format string doesn't solve the problem directly...

vName='Varname_1';       % the variable name you're looking for
fmt=[vName '%s %f %f'];  % match vName, string, two numerics
fid=fopen('yourbigfile.csv','r');
data=textscan(fid,fmt,'delimiter',',');
fid=fclose(fid);

As said I'm not positive, but I think there's at least a reasonable chance the pattern-matching will do what you're looking for. Worth a shot methinks...

Well, doggonit, magic doesn't happen, joy didn't ensue... :(

But, the original idea isn't difficult...

while ~feof(fid)
  l=fgetl(fid);
  if strfind(l,vName)
    data{i}=textscan(l,fmt);
  end
end
fid=fclose(fid);

worked for a sample file albeit I used space-delimited and '.' as the decimal indicator; I think that'll still be a problem.

I thought

while ~feof(fid)
  try
    data{i}=textscan(l,fmt);
  catch
  end
end
fid=fclose(fid);

would work around the issue but it didn't; textscan simply gave up and quit reading anything once if failed; it doesn't throw an error, it just throws up its hands silently. :(

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

dpb 2016 年 7 月 6 日

I used textscan not csvread, IA???

He's also got comma as the decimal indicator and says he's got a .csv file in which case it's indeterminable--which comma is a delimiter and which is a decimal point?

Image Analyst 2016 年 7 月 6 日

Oh, sorry - I didn't notice.

サインインしてコメントする。

Answer 3

Lorenzo 2016 年 7 月 6 日

MATLAB Online で開く

0 投票

Got it. readtable() works lightning fast. This is my approach:

1) overwrite , with . as decimal delimiter(not necessary but I need the values as numbers for postprocessing)

2) readtable

comma2point_overwrite('bigdata.csv')
T = readtable('bigdata.csv', 'Delimiter', ';');
T2 = T(find(strcmp('Durchflussmessung-H2-163bar_real', T{:,1})),:)
clearvars T;

where comma2point_overwrite() is:

function    comma2point_overwrite( filespec )
    % replaces all occurences of comma (",") with point (".") in a text-file.
    % Note that the file is overwritten, which is the price for high speed.
        file    = memmapfile( filespec, 'writable', true );
        comma   = uint8(',');
        point   = uint8('.');
        file.Data( transpose( file.Data==comma) ) = point;
end

Thanks for Your Help!!

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

Steven Hunsinger 2022 年 9 月 14 日

Not so lightning fast if you get your company network involved. 67.5MB with a breakpoint after readtable. 10 minutes. This might be OK if I need all that data loaded into RAM, but seems excessive for reading the first line or so. Is there a better way?

サインインしてコメントする。

Read specific rows from a large .csv

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

その他の回答 (2 件)

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

カテゴリ

タグ

Community Treasure Hunt

Read specific rows from a large .csv

3 件のコメント 1 件の古いコメントを表示 1 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

その他の回答 (2 件)

3 件のコメント 1 件の古いコメントを表示 1 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

カテゴリ

タグ

参考

Community Treasure Hunt

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示