Can I use Matlab to read in data that's in an unusual layout

Question

Bob Kane 2019 年 7 月 1 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/469656-can-i-use-matlab-to-read-in-data-that-s-in-an-unusual-layout

コメント済み: dpb 2019 年 7 月 1 日

I've been using this software called LAMMPS which is a molecular simulator and I want to extract certain pieces of information from it. The data is outputted in one of two ways. The first is in a .dat file and looks like this:

LAMMPS (7 Dec 2018)
Created orthogonal box = (0 -1 -0.25) to (50 11 0.25)
  1 by 1 by 1 MPI processor grid
Created 1 atoms
  Time spent = 6.12736e-05 secs
Created 1 atoms
  Time spent = 0.000876665 secs
1 atoms in group fixed
1 atoms in group free
Per MPI rank memory allocation (min/avg/max) = 4.034 | 4.034 | 4.034 Mbytes
Step Time Temp TotEng E_pair v_2 
       0            0            0  -0.99995177  -0.99995177      29.4142 
     100          0.1    4111601.7  -0.94316771  -0.99993456    29.416675 
     200          0.2    142194.24  -0.99615598  -0.99811919    29.383619 
     300          0.3    3330578.7  -0.94969838  -0.99568203    29.367122 
     400          0.4     12247239   -0.8288457  -0.99793725    29.382028 
     500          0.5      2775369  -0.96146719  -0.99978534    29.405196 
     600          0.6     13813605  -0.80919796  -0.99991556    29.419406 
     700          0.7    3394332.4  -0.95195073  -0.99881459    29.437799 
     800          0.8    3690647.8  -0.94890506     -0.99986    29.407367 
     900          0.9     10817030  -0.85044571  -0.99979107    29.405362 
    1000            1    39449.019  -0.99796461  -0.99850926    29.441106 
Loop time of 33.1504 on 1 procs for 50000000 steps with 2 atoms
Performance: 130315002188.459 ns/day, 0.000 hours/ns, 1508275.488 timesteps/s
91.9% CPU use with 1 MPI tasks x no OpenMP threads
MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 3.4592     | 3.4592     | 3.4592     |   0.0 | 10.43
Neigh   | 0.095202   | 0.095202   | 0.095202   |   0.0 |  0.29
Comm    | 3.3945     | 3.3945     | 3.3945     |   0.0 | 10.24
Output  | 7.3908     | 7.3908     | 7.3908     |   0.0 | 22.29
Modify  | 10.004     | 10.004     | 10.004     |   0.0 | 30.18
Other   |            | 8.806      |            |       | 26.56

Basically loads of preable and post amble text but the information I want is in the middle. What I have been doing so far is manually cutting off the bottom and top bits of text so all I am left with in the file is:

Step Time Temp TotEng E_pair v_2 
       0            0            0  -0.99995177  -0.99995177      29.4142 
        0.1    4111601.7  -0.94316771  -0.99993456    29.416675 
        0.2    142194.24  -0.99615598  -0.99811919    29.383619 
        0.3    3330578.7  -0.94969838  -0.99568203    29.367122 
        0.4     12247239   -0.8288457  -0.99793725    29.382028 
        0.5      2775369  -0.96146719  -0.99978534    29.405196 
        0.6     13813605  -0.80919796  -0.99991556    29.419406 
        0.7    3394332.4  -0.95195073  -0.99881459    29.437799 
        0.8    3690647.8  -0.94890506     -0.99986    29.407367 
        0.9     10817030  -0.85044571  -0.99979107    29.405362 
          1    39449.019  -0.99796461  -0.99850926    29.441106 

And then I have used readtable to extract the information I want (which for me is the last column):

T = readtable('corrthermmid.dat');
X = T.v_2(:)-28;

Is there a way I could use something like textscan to get out this information without manually editing it?

What would be more useful for me actually is if anyone has any idea how I could use matlab to extract information presented in this way:

ITEM: TIMESTEP

0

ITEM: NUMBER OF ATOMS

2

ITEM: BOX BOUNDS mm mm pp

0.0000000000000000e+00 5.0000000000000000e+01

-1.0000000000000000e+00 1.1000000000000000e+01

-2.5000000000000000e-01 2.5000000000000000e-01

ITEM: ATOMS id type xs ys zs

1 1 0.3 0.5 0.5

2 2 0.588284 0.5 0.5

ITEM: TIMESTEP

100

ITEM: NUMBER OF ATOMS

2

ITEM: BOX BOUNDS mm mm pp

0.0000000000000000e+00 5.0000000000000000e+01

-1.0000000000000000e+00 1.1000000000000000e+01

-2.5000000000000000e-01 2.5000000000000000e-01

ITEM: ATOMS id type xs ys zs

1 1 0.3 0.5 0.5

2 2 0.588334 0.5 0.5

ITEM: TIMESTEP

200

ITEM: NUMBER OF ATOMS

2

ITEM: BOX BOUNDS mm mm pp

0.0000000000000000e+00 5.0000000000000000e+01

-1.0000000000000000e+00 1.1000000000000000e+01

-2.5000000000000000e-01 2.5000000000000000e-01

ITEM: ATOMS id type xs ys zs

1 1 0.3 0.5 0.5

2 2 0.587672 0.5 0.5

Where I have underlined the data points I would want to extract into a matrix, array or something similar.

Thanks for any help!

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

dpb 2019 年 7 月 1 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/469656-can-i-use-matlab-to-read-in-data-that-s-in-an-unusual-layout#answer_381481

MATLAB Online で開く

A little tedious to set up, but easily-enough handled for a regular file format such as this.

First you have to either know there's a fixed number of header lines to the data section in question or scan the file to find a marker line within the file that is consistent in its position relative to the beginning of the desired data. I this case for the first data set it appears the title line Step Time Temp TotEng E_pair v_2 is unique...

fmt1=repmat('%f',1,6);             % first data section format
fid=fopen('yourfile.dat','r');     % open file
l=fgetl(fid);                      % read first line
while ~feof(fid)                   % loop through file by record
  if strfind(l),'Step Time Temp TotEng E_pair v_2'), break, end    % break when find first section
  l=fgetl(fid);                    % next record
end
data=cell2mat(textscan(fid,'%'));  % read the data; will fail on finding subsequent text line 'Loop...'
data=data(:,2)-28;                 % what's the -28 for???

To read the remaining sections, you just rinse and repeat similar logic to find the first timestep section, write a piece of code to parse that section and then place that code in a loop, catenating the desired data as you go.

It would help somebody to actually write a piece of code to attach a sample data file to work with, but that's the outline.

2 件のコメント
なしを表示なしを非表示

Bob Kane 2019 年 7 月 1 日

編集済み: Bob Kane 2019 年 7 月 1 日

MATLAB Online で開く

Thank you for your help. I think I've ironed out some slight kinks and have ended up with the following script:

fid=fopen('matlab question.dat','r');     % open file
l=fgetl(fid);                      % read first line
while ~feof(fid)                   % loop through file by record
  if strfind(l,'Step Time Temp TotEng E_pair v_2'), break, end    % break when find first section
  l=fgetl(fid);                    % next record
end
data=cell2mat(textscan(fid,'%f'));% read the data; will fail on finding subsequent text line 'Loop...'
data=transpose(reshape(data,6 , [])); 
X = data(:,6)-28;

which does indeed output the information I wanted. I'm not sure if all this reshaping of matrices I've had to do is really inefficient but it was the easiest way I could think of to get it back into a 6 column matrix.

Now I just need to think how I can modify this for the second case...

dpb 2019 年 7 月 1 日

MATLAB Online で開く

"...all this reshaping of matrices ... was the easiest way I could think of to get it back into a 6 column matrix."

My code snippet began with

fmt1=repmat('%f',1,6);             % first data section format
...

which I then immediately didn't use where intended--

data=cell2mat(textscan(fid,fmt1));

サインインしてコメントする。

Answer 2

Bob Kane 2019 年 7 月 1 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/469656-can-i-use-matlab-to-read-in-data-that-s-in-an-unusual-layout#answer_381522

MATLAB Online で開く

I'm not convinced they're both the most efficient way but here's the solution I have based heavily on dpb's answer.

For the first dataset where all the information is in one block then this works for ultimately extracting all of one column:

fid=fopen('matlab question.dat','r');     % open file
l=fgetl(fid);                      % read first line
while ~feof(fid)                   % loop through file by record
  if strfind(l,'Step Time Temp TotEng E_pair v_2'), break, end    % break when find first section
  l=fgetl(fid);                    % next record
end
data=cell2mat(textscan(fid,'%f'));% read the data; will fail on finding subsequent text line 'Loop...'
data=transpose(reshape(data,6 , [])); 
X = data(:,6)-28;

For the second dataset where there are loads of little datasets at each timestep seperated by text this works (albeit slowly for large datasets):

fid=fopen('dump.doublewell_mid','r');     % open file
l=fgetl(fid);                      % read first line
i = 1;
while ~feof(fid)                   % loop through file by record
  if strfind(l,'ITEM: ATOMS id type xs ys zs'),
    data=cell2mat(textscan(fid,'%f')); %outputs all numbers after str found
    data=transpose(reshape(data,5 , [])); %reshapes the numbers into the format in original file
    X{i} = data(2,3); %extracts the information we are interested in from each one
    i = i+1;
  end
  l=fgetl(fid);                    % next record
end
X = cell2mat(X);

If anyone knows of any more efficient ways to do the second one then I'd be very interested.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

dpb 2019 年 7 月 1 日

Since they're duplicated, find the difference between sets and then use the 'headerlines' parameter.

サインインしてコメントする。

Can I use Matlab to read in data that's in an unusual layout

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Can I use Matlab to read in data that's in an unusual layout

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

2 件のコメント なしを表示なしを非表示

その他の回答 (1 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示