Converting a .txt into a series of matrices

Hello, I am new matlab user and I need to convert mol2 files (a text document that stores data on positions of atoms in a molecule) into multiple matrices, so that I can manipulate the data. A sample mol2 is shown below. I've also attached the full mol2 file.
@<TRIPOS>MOLECULE
*****
83 89 0 0 0
SMALL
GASTEIGER
@<TRIPOS>ATOM
1 C -2.1071 -0.8238 0.0543 C.ar 1 LIG1 -0.0157
2 C -0.8284 -1.4433 0.0053 C.ar 1 LIG1 -0.0265
3 C 0.3551 -0.6761 -0.0339 C.ar 1 LIG1 0.0903
4 C 0.2486 0.7084 -0.0225 C.ar 1 LIG1 0.0691
5 C -0.9965 1.3355 0.0209 C.ar 1 LIG1 -0.0355
@<TRIPOS>BOND
1 1 2 ar
2 2 3 ar
3 3 4 ar
4 4 5 ar
5 5 6 ar
I want to convert the mol2 file into 3 arrays holding the information in @<TRIPOS>MOLECULE, @<TRIPOS>ATOM, and @<TRIPOS>BOND. For example, I want the "Molecule" array to look like {[83 89 0 0 0], SMALL, GASTEIGER}, and the "Atom" array to look like {[1, C, -2.1071, -0.8238, 0.0543, C.ar, 1, LIG1, -0.0157]...}.
Any help would be greatly appreciated.

 採用された回答

Cedric
Cedric 2013 年 11 月 6 日
編集済み: Cedric 2013 年 11 月 6 日

1 投票

Here is one way to achieve this:
fileLocator = 'Bip_LAMMPS_Pract.txt' ;
content = fileread( fileLocator ) ;
tokens = regexp( content, 'MOLECULE\s*\*+([\s\d]+)([^@]+)', 'tokens' ) ;
text = strtrim( regexprep( tokens{1}{2}, '\s+', ' ' )) ;
molecule = { sscanf( tokens{1}{1}, '%d' ), text } ;
tokens = regexp( content, 'ATOM([^@]+)', 'tokens' ) ;
atom = textscan( tokens{1}{1}, '%f %s %f %f %f %s %f %s %f' ) ;
tokens = regexp( content, 'BOND(.+)', 'tokens' ) ;
bond = textscan( tokens{1}{1}, '%f %f %f %s' ) ;
Have a look at the output and let me know if you have any question. As you are new to MATLAB, note that regular/numeric arrays cannot store mixed data (numeric and strings). For this kind of data, we use cell arrays. In short, they they have to be indexed using curly brackets for accessing cells content. In the present case, molecule is a cell array which contains two cells:
>> molecule
molecule =
[5x1 double] 'SMALL GASTEIGER'
For accessing the content of the first cell (which is a numeric array):
>> molecule{1}
ans =
83
89
0
0
0
For accessing element 2 of this numeric array:
>> molecule{1}(2)
ans =
89

1 件のコメント

Eric
Eric 2013 年 11 月 6 日
Thanks for the help. The output files were what I was looking for.

サインインしてコメントする。

その他の回答 (0 件)

製品

質問済み:

2013 年 11 月 6 日

編集済み:

2013 年 11 月 6 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by