What is the fastest and smartest way to import and manage/plot many text files in matlab?

Question

Giuseppe 2022 年 2 月 8 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1645930-what-is-the-fastest-and-smartest-way-to-import-and-manage-plot-many-text-files-in-matlab

コメント済み: Walter Roberson 2022 年 2 月 17 日

Hi guys! I've processed my data in Fortran and I produced 55 .txt files (as shown in the following image). These files contains asteroids position stored in three columns (x,y,z coordinates); see an example file atteched below.

I'm looking for a smart and quick procedure to import them in matlab by using few code lines. Then I want to plot my data and maybe perform further computations. My idea is to store the files in a data structure subdivided by the number of asteroids (55 in my case), then create a loop that runs through all the files in my folder and maybe be able to stop at the last file by automatically determining the index value of said loop. In a few words I would like to create an automated procedure to import many files in a single variable that I can then manage for further calculations or plots in matlab. I found a code that seems to fit my request:

filenames = dir('D:\OneDrive\MSc_Thesis\Projects\NEOs_orbits\OutputFiles\Orbit_asteroid_*.txt');
number_of_files = numel(filenames); col_values = [];
for ii = 1:number_of_files
    all_values = load(filenames(ii).name); %Error using load Unable to read file 'Orbit_asteroid_01.txt'. No such file or directory. 
    col_values = [col_values; all_values(:,1)]; 
end

But I get the error printed as comment. Can you help me, please?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Walter Roberson 2022 年 2 月 9 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1645930-what-is-the-fastest-and-smartest-way-to-import-and-manage-plot-many-text-files-in-matlab#answer_891795

編集済み: Walter Roberson 2022 年 2 月 9 日

MATLAB Online で開く

filename = 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/888225/Orbit_asteroid_02.txt';
%"warm up" -- this will likely read the file into memory so we can remove
%the disk speed component
for K = 1 : 3; urlread(filename); end
%done warm-up -- mostly ignore the above, it is disk dominated
tic;
allvalues = readmatrix(filename);
toc
Elapsed time is 0.301401 seconds.
tic;
allvalues = table2array(readtable(filename));
toc
Elapsed time is 0.181070 seconds.
tic;
S = urlread(filename);
allvalues = cell2mat(textscan(S, '%f %f %f', 'HeaderLines', 3));
toc
Elapsed time is 0.052827 seconds.
tic;
L = regexp(urlread(filename), '\r?\n', 'split');
L(1:3) = [];
allvalues = cell2mat(cellfun(@(S) sscanf(S, '%f %f %f', [1 inf]), L, 'uniform', 0));
toc
Elapsed time is 0.058267 seconds.
tic;
L = regexp(urlread(filename), '\r?\n', 'split');
L(1:3) = [];
S = strjoin(L, '\n');
allvalues = sscanf(S, '%f %f %f', [3 inf]).';
toc
Elapsed time is 0.066936 seconds.
tic;
L = regexprep(urlread(filename), '^.*\n.*\n.*\n', '', 'once', 'dotexceptnewline');
allvalues = sscanf(L, '%f %f %f', [3 inf]).';
toc
Elapsed time is 0.041863 seconds.
tic;
L = regexprep(urlread(filename), '^.*\n.*\n.*\n', '', 'once', 'dotexceptnewline');
allvalues = cell2mat(textscan(L, '%f %f %f'));
toc
Elapsed time is 0.037555 seconds.
tic;
L = regexprep(urlread(filename), '^.*\n.*\n.*\n', '', 'once', 'dotexceptnewline');
allvalues = str2num(L);
toc
Elapsed time is 0.046617 seconds.
tic;
L = regexprep(urlread(filename), '^.*\n.*\n.*\n', '', 'once', 'dotexceptnewline');
T = tempname();
fid = fopen(T, 'w');
fwrite(fid, L);
fclose(fid);
allvalues = load(T);
toc
Elapsed time is 0.042073 seconds.

I had a bug in the earlier version of some of the conversions, so these conclusions are revised:

Fastest: read the file as text, use text processing to remove the header, textscan() the result. Oddly this was notably faster than textscan() of the full data with 'headerliens'

Second fastest: various text processing methods are close enough to each other over multiple runs that I cannot judge between them. That is, using this test facility, the times varied enough that the answer is not clear. Running on your own system would likely produce a different result.

Of particular note: read the file as a string, removing the header, writing the file out, and using load(), is still about 5 times faster than using readtable()

In your real system, it might turn out that using fopen() to open the file, and then use textscan(), might work out as the fastest. It is a bit difficult to benchmark for the case where the file is not already in filesystem cache, without risking the possibility that as you run the several tests, that the second and following tests might be taking advantage of the file system already having cached the file.

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Walter Roberson 2022 年 2 月 9 日

編集済み: Walter Roberson 2022 年 2 月 9 日

MATLAB Online で開く

You were failing to include the directory name when you tried to read the information. The below version of the code automatically includes the directory information.

projectdir = 'D:\OneDrive\MSc_Thesis\Projects\NEOs_orbits\OutputFiles';
dinfo = dir( fullfile(projectdir, 'Orbit_asteroid_*.txt') );
filenames = fullfile( {dinfo.folder}, {dinfo.name});
number_of_files = numel(filenames);
col_values = [];
for ii = 1:number_of_files
    thisfilename = filenames{ii};
    all_values = CodeToLoadTheFile(thisfilename);
    col_values = [col_values; all_values(:,1)]; 
end

where CodeToLoadTheFile is code that will read the file for you.

What the code should look like depends on your choice of convenience compared to performance.

I showed above the the fastest performance, when the file was already in filesystem cache, was with

L = regexprep(urlread(filename), '^.*\n.*\n.*\n', '', 'once', 'dotexceptnewline');
allvalues = cell2mat(textscan(L, '%f %f %f'));

but since you would be using a local file instead of a remote file, you would instead use

L = regexprep(fileread(thisfilename), '^.*\n.*\n.*\n', '', 'once', 'dotexceptnewline');
allvalues = cell2mat(textscan(L, '%f %f %f'));

... but it would be more convenient to use readmatrix, even though it might be 8-10 times slower.

allvalues = readmatrix(thisfilename);

Giuseppe 2022 年 2 月 9 日

MATLAB Online で開く

Thanks for all the examples. According your opinion, is it a good choice to organize my data ina struct array or is it better to select another choice (i.e. cell array)?

I'ill try to explain better my purpose: I would like to get a structure that I can easily handle.

For example, if I want to plot data of asteroid I think something like this:

%Let's assume I want to plot the orbits of first 10 asteroids
for i = 1: 10
    plot(Asteroid(i).x,Asteroid(i).y,Asteroid(i).z)
end

Can you give me an example to build such a data structure?

Walter Roberson 2022 年 2 月 9 日

MATLAB Online で開く

struct is certainly do-able.

projectdir = 'D:\OneDrive\MSc_Thesis\Projects\NEOs_orbits\OutputFiles';
dinfo = dir( fullfile(projectdir, 'Orbit_asteroid_*.txt') );
filenames = fullfile( {dinfo.folder}, {dinfo.name});
number_of_files = numel(filenames);
Astroid(number_of_files) = struct('x', [], 'y', [], 'z');   %pre-allocates
for ii = 1:number_of_files
    thisfilename = filenames{ii};
    all_values = CodeToLoadTheFile(thisfilename);
    Astroid(ii).x = all_values(:,1);
    Astroid(ii).y = all_values(:,2);
    Astroid(ii).z = all_values(:,3);
end

but cell array might have higher performance, perhaps

projectdir = 'D:\OneDrive\MSc_Thesis\Projects\NEOs_orbits\OutputFiles';
dinfo = dir( fullfile(projectdir, 'Orbit_asteroid_*.txt') );
filenames = fullfile( {dinfo.folder}, {dinfo.name});
number_of_files = numel(filenames);
Astroid = cell(number_of_files, 1);
for ii = 1:number_of_files
    thisfilename = filenames{ii};
    all_values = CodeToLoadTheFile(thisfilename);
    Astroid{ii} = all_values;
end

which would

plot3(Asteroid{ii}(:,1), Asteroid{ii}(:,2), Asteroid({ii}(:,3))

サインインしてコメントする。

Answer 2

David Hill 2022 年 2 月 8 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1645930-what-is-the-fastest-and-smartest-way-to-import-and-manage-plot-many-text-files-in-matlab#answer_891720

MATLAB Online で開く

As long as you are in the folder you want, you just need the file name.

for k=1:55
  f=num2str(k);
  if length(f)==1
     f=strcat('0',f);
  end
  r(:,(k-1)*3+1:3*k)=readmatrix(strcat('Orbit_asteroid_',f));
end
%then plot what you want
plot3(r(:,4),r(:,5),r(:,6));%2nd asteroid

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 3

HighPhi 2022 年 2 月 8 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1645930-what-is-the-fastest-and-smartest-way-to-import-and-manage-plot-many-text-files-in-matlab#answer_891735

MATLAB Online で開く

you can't use 'load' here, you can only load MAT-files or certain ASCII files

best way to import these files is by using:

all_values = readtable(filenames(ii).name);
all_values = table2array(all_values);

Good luck

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Walter Roberson 2022 年 2 月 17 日

@Highphi

Yes and no.

You are correct that the original format of the file is something that load() would not process. And in cases where clarity and convenience is important (which is a lot of the time, really), using one of the other possibilities instead of load() would be a better choice.

But as I showed near the end of https://www.mathworks.com/matlabcentral/answers/1645930-what-is-the-fastest-and-smartest-way-to-import-and-manage-plot-many-text-files-in-matlab#answer_891795 it is possible to read the file, strip off the header, write the the revised content, and load() the revised content, and still end up with a processing time that is competitive with nearly all the more convenient approaches. At least until the file starts taking a major fraction of your memory, or until the hard-drive speed starts to dominate.

サインインしてコメントする。

What is the fastest and smartest way to import and manage/plot many text files in matlab?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

その他の回答 (2 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

What is the fastest and smartest way to import and manage/plot many text files in matlab?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

その他の回答 (2 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示