Optimising my data importer for large datasets

Question

HC98 2023 年 3 月 26 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1935489-optimising-my-data-importer-for-large-datasets

編集済み: Matt J 2023 年 3 月 26 日

So I have this

txtFiles = dir('*.txt') ; %loads txt files
N = length(txtFiles) ; 
Numit = N;
[~, reindex] = sort( str2double( regexp( {txtFiles.name}, '\d+', 'match', 'once' ))); % sorts files
txtFiles = txtFiles(reindex);
for i = 1:N
    data = importdata(txtFiles(i).name);
    x = data(:,1);
    udata(:,i) = data(:,2) ;
end

I have quite a large dataset (well over 200 files) and it takes ages to load things. How can I speed this up? Is there some sort of prepocessing I can do like merge all the files into one or something? I don't know...

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Walter Roberson 2023 年 3 月 26 日

I wonder if using a datastore would be appropriate for your work?

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Matt J 2023 年 3 月 26 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1935489-optimising-my-data-importer-for-large-datasets#answer_1201259

編集済み: Matt J 2023 年 3 月 26 日

MATLAB Online で開く

I don't see any pre-allocation of udata. Also, nothing is being done with x, so it will cut down on time if you don't create it.

udata=cell(1,N);
for i = 1:N
    data = importdata(txtFiles(i).name);
    %x = data(:,1);
    udata{i} = data(:,2) ;
end
udata=cell2mat(udata);

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Matt J 2023 年 3 月 26 日

編集済み: Matt J 2023 年 3 月 26 日

MATLAB Online で開く

If the data files have many columns, it will also go faster if you read in only the first two columns, maybe using textscan.

udata=cell(1,N);
for i = 1:N
    fileID = fopen(txtFiles(i).name);
    data = textscan(fileID,'%f %f %*[^\n]');
    fclose(fileID);
    
    udata{i} = data(:,2) ;
    
end
udata=cell2mat(udata);

サインインしてコメントする。

Optimising my data importer for large datasets

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Optimising my data importer for large datasets

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

回答 (1 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示