Read CSV into table but not all of the columns

46 ビュー (過去 30 日間)
Ran Peiser
Ran Peiser 2023 年 5 月 9 日
コメント済み: Star Strider 2023 年 5 月 9 日
I have a large CSV of ~200 MB with about 1 milion lines and 50 columns.
I need all of the lines, but only like 4-5 columns.
When I read the entire table with 'readtable' and then filter only the colums I need it takes about 10 seconds and more than 1GB of memory.
Is there a more efficient way to read only the columns I need?
Thanks!

回答 (2 件)

Star Strider
Star Strider 2023 年 5 月 9 日
Without at least an example of your .csv file it is difficult to determine that. One option (for both readtable and readmatrix) is the 'Range' name-value pair. There is no direct link, however you can search for it in the Spreadsheet Files section of the documentation. It is more fully explained in the DataRange documentation section of SpreadsheetImportOptions.
I am not sure if this is more efficient (I never timed it), however it is the only option that appears to exist for what you want to do.
  6 件のコメント
Sarah Gilmore
Sarah Gilmore 2023 年 5 月 9 日
You can can actually read non-consecutive columns with SelectedVariableNames. For example, here's how you could read Param1 and Param3 from the example file:
opts = detectImportOptions("F.csv");
opts.SelectedVariableNames = ["Param1" "Param3"];
T = readtable("F.csv", opts)
T = 4×2 table
Param1 Param3 ______ ______ 0 0 1 3 100 400 7 9
I hope this helps.
Best,
Sarah
Star Strider
Star Strider 2023 年 5 月 9 日
@Sarah Gilmore — It does.
Thank you!

サインインしてコメントする。


Stephen23
Stephen23 2023 年 5 月 9 日
編集済み: Stephen23 2023 年 5 月 9 日
First lets create a fake data file:
writetable(array2table(rand(1e5,50),'Variablenames',"V"+(1:50)),"test.csv")
Now lets try some different ways to import that data:
X = [5,23,42]; % the columns you want
tic
T0 = readtable("test.csv");
T0 = T0(:,X);
toc
Elapsed time is 1.741005 seconds.
tic
C = cell(1,50);
C(:) = {'%*f'};
C(X) = {'%f'};
T1 = readtable("test.csv", "Format",[C{:}]);
toc
Elapsed time is 0.750706 seconds.
tic
opts = detectImportOptions("test.csv", "filetype","delimitedtext");
opts.SelectedVariableNames = "V"+X;
T2 = readtable("test.csv", opts);
toc
Elapsed time is 0.846019 seconds.
Checking:
isequal(T0,T1,T2)
ans = logical
1

カテゴリ

Help Center および File ExchangeLarge Files and Big Data についてさらに検索

タグ

製品


リリース

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by