Can I use Matlab to read in data that's in an unusual layout

11 ビュー (過去 30 日間)
Bob Kane
Bob Kane 2019 年 7 月 1 日
コメント済み: dpb 2019 年 7 月 1 日
I've been using this software called LAMMPS which is a molecular simulator and I want to extract certain pieces of information from it. The data is outputted in one of two ways. The first is in a .dat file and looks like this:
LAMMPS (7 Dec 2018)
Created orthogonal box = (0 -1 -0.25) to (50 11 0.25)
1 by 1 by 1 MPI processor grid
Created 1 atoms
Time spent = 6.12736e-05 secs
Created 1 atoms
Time spent = 0.000876665 secs
1 atoms in group fixed
1 atoms in group free
Per MPI rank memory allocation (min/avg/max) = 4.034 | 4.034 | 4.034 Mbytes
Step Time Temp TotEng E_pair v_2
0 0 0 -0.99995177 -0.99995177 29.4142
100 0.1 4111601.7 -0.94316771 -0.99993456 29.416675
200 0.2 142194.24 -0.99615598 -0.99811919 29.383619
300 0.3 3330578.7 -0.94969838 -0.99568203 29.367122
400 0.4 12247239 -0.8288457 -0.99793725 29.382028
500 0.5 2775369 -0.96146719 -0.99978534 29.405196
600 0.6 13813605 -0.80919796 -0.99991556 29.419406
700 0.7 3394332.4 -0.95195073 -0.99881459 29.437799
800 0.8 3690647.8 -0.94890506 -0.99986 29.407367
900 0.9 10817030 -0.85044571 -0.99979107 29.405362
1000 1 39449.019 -0.99796461 -0.99850926 29.441106
Loop time of 33.1504 on 1 procs for 50000000 steps with 2 atoms
Performance: 130315002188.459 ns/day, 0.000 hours/ns, 1508275.488 timesteps/s
91.9% CPU use with 1 MPI tasks x no OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 3.4592 | 3.4592 | 3.4592 | 0.0 | 10.43
Neigh | 0.095202 | 0.095202 | 0.095202 | 0.0 | 0.29
Comm | 3.3945 | 3.3945 | 3.3945 | 0.0 | 10.24
Output | 7.3908 | 7.3908 | 7.3908 | 0.0 | 22.29
Modify | 10.004 | 10.004 | 10.004 | 0.0 | 30.18
Other | | 8.806 | | | 26.56
Basically loads of preable and post amble text but the information I want is in the middle. What I have been doing so far is manually cutting off the bottom and top bits of text so all I am left with in the file is:
Step Time Temp TotEng E_pair v_2
0 0 0 -0.99995177 -0.99995177 29.4142
100 0.1 4111601.7 -0.94316771 -0.99993456 29.416675
200 0.2 142194.24 -0.99615598 -0.99811919 29.383619
300 0.3 3330578.7 -0.94969838 -0.99568203 29.367122
400 0.4 12247239 -0.8288457 -0.99793725 29.382028
500 0.5 2775369 -0.96146719 -0.99978534 29.405196
600 0.6 13813605 -0.80919796 -0.99991556 29.419406
700 0.7 3394332.4 -0.95195073 -0.99881459 29.437799
800 0.8 3690647.8 -0.94890506 -0.99986 29.407367
900 0.9 10817030 -0.85044571 -0.99979107 29.405362
1000 1 39449.019 -0.99796461 -0.99850926 29.441106
And then I have used readtable to extract the information I want (which for me is the last column):
T = readtable('corrthermmid.dat');
X = T.v_2(:)-28;
Is there a way I could use something like textscan to get out this information without manually editing it?
What would be more useful for me actually is if anyone has any idea how I could use matlab to extract information presented in this way:
ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
2
ITEM: BOX BOUNDS mm mm pp
0.0000000000000000e+00 5.0000000000000000e+01
-1.0000000000000000e+00 1.1000000000000000e+01
-2.5000000000000000e-01 2.5000000000000000e-01
ITEM: ATOMS id type xs ys zs
1 1 0.3 0.5 0.5
2 2 0.588284 0.5 0.5
ITEM: TIMESTEP
100
ITEM: NUMBER OF ATOMS
2
ITEM: BOX BOUNDS mm mm pp
0.0000000000000000e+00 5.0000000000000000e+01
-1.0000000000000000e+00 1.1000000000000000e+01
-2.5000000000000000e-01 2.5000000000000000e-01
ITEM: ATOMS id type xs ys zs
1 1 0.3 0.5 0.5
2 2 0.588334 0.5 0.5
ITEM: TIMESTEP
200
ITEM: NUMBER OF ATOMS
2
ITEM: BOX BOUNDS mm mm pp
0.0000000000000000e+00 5.0000000000000000e+01
-1.0000000000000000e+00 1.1000000000000000e+01
-2.5000000000000000e-01 2.5000000000000000e-01
ITEM: ATOMS id type xs ys zs
1 1 0.3 0.5 0.5
2 2 0.587672 0.5 0.5
Where I have underlined the data points I would want to extract into a matrix, array or something similar.
Thanks for any help!

採用された回答

dpb
dpb 2019 年 7 月 1 日
A little tedious to set up, but easily-enough handled for a regular file format such as this.
First you have to either know there's a fixed number of header lines to the data section in question or scan the file to find a marker line within the file that is consistent in its position relative to the beginning of the desired data. I this case for the first data set it appears the title line Step Time Temp TotEng E_pair v_2 is unique...
fmt1=repmat('%f',1,6); % first data section format
fid=fopen('yourfile.dat','r'); % open file
l=fgetl(fid); % read first line
while ~feof(fid) % loop through file by record
if strfind(l),'Step Time Temp TotEng E_pair v_2'), break, end % break when find first section
l=fgetl(fid); % next record
end
data=cell2mat(textscan(fid,'%')); % read the data; will fail on finding subsequent text line 'Loop...'
data=data(:,2)-28; % what's the -28 for???
To read the remaining sections, you just rinse and repeat similar logic to find the first timestep section, write a piece of code to parse that section and then place that code in a loop, catenating the desired data as you go.
It would help somebody to actually write a piece of code to attach a sample data file to work with, but that's the outline.
  2 件のコメント
Bob Kane
Bob Kane 2019 年 7 月 1 日
編集済み: Bob Kane 2019 年 7 月 1 日
Thank you for your help. I think I've ironed out some slight kinks and have ended up with the following script:
fid=fopen('matlab question.dat','r'); % open file
l=fgetl(fid); % read first line
while ~feof(fid) % loop through file by record
if strfind(l,'Step Time Temp TotEng E_pair v_2'), break, end % break when find first section
l=fgetl(fid); % next record
end
data=cell2mat(textscan(fid,'%f'));% read the data; will fail on finding subsequent text line 'Loop...'
data=transpose(reshape(data,6 , []));
X = data(:,6)-28;
which does indeed output the information I wanted. I'm not sure if all this reshaping of matrices I've had to do is really inefficient but it was the easiest way I could think of to get it back into a 6 column matrix.
Now I just need to think how I can modify this for the second case...
dpb
dpb 2019 年 7 月 1 日
"...all this reshaping of matrices ... was the easiest way I could think of to get it back into a 6 column matrix."
My code snippet began with
fmt1=repmat('%f',1,6); % first data section format
...
which I then immediately didn't use where intended--
data=cell2mat(textscan(fid,fmt1));

サインインしてコメントする。

その他の回答 (1 件)

Bob Kane
Bob Kane 2019 年 7 月 1 日
I'm not convinced they're both the most efficient way but here's the solution I have based heavily on dpb's answer.
For the first dataset where all the information is in one block then this works for ultimately extracting all of one column:
fid=fopen('matlab question.dat','r'); % open file
l=fgetl(fid); % read first line
while ~feof(fid) % loop through file by record
if strfind(l,'Step Time Temp TotEng E_pair v_2'), break, end % break when find first section
l=fgetl(fid); % next record
end
data=cell2mat(textscan(fid,'%f'));% read the data; will fail on finding subsequent text line 'Loop...'
data=transpose(reshape(data,6 , []));
X = data(:,6)-28;
For the second dataset where there are loads of little datasets at each timestep seperated by text this works (albeit slowly for large datasets):
fid=fopen('dump.doublewell_mid','r'); % open file
l=fgetl(fid); % read first line
i = 1;
while ~feof(fid) % loop through file by record
if strfind(l,'ITEM: ATOMS id type xs ys zs'),
data=cell2mat(textscan(fid,'%f')); %outputs all numbers after str found
data=transpose(reshape(data,5 , [])); %reshapes the numbers into the format in original file
X{i} = data(2,3); %extracts the information we are interested in from each one
i = i+1;
end
l=fgetl(fid); % next record
end
X = cell2mat(X);
If anyone knows of any more efficient ways to do the second one then I'd be very interested.
  1 件のコメント
dpb
dpb 2019 年 7 月 1 日
Since they're duplicated, find the difference between sets and then use the 'headerlines' parameter.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeText Data Preparation についてさらに検索

製品


リリース

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by