How to extract repeatedly information from a text and store a specific table?

1 回表示 (過去 30 日間)
Alan Cesar Pilon Miro
Alan Cesar Pilon Miro 2018 年 4 月 25 日
回答済み: Sarah Palfreyman 2018 年 4 月 30 日

Hello, I have SDF file containing structure information over 150K substances and I'm interested in extract some information from there. I had to convert to .xls due to matlab supporting.

You can visualize an example of this file below. The ORIGINAL TEXT FILE IS AVAILABLE AS ONTHOLOGY.XLS

Each structure starts with a code (in this example: Q2785366) and finishes with a $$$$.

Q2785366-1 %name of structure

% A set of numbers that represent a .mol file. I'm not interested on them.

%Attributes

"> InChIKey" InChIKey=FPRJHXLFTAYEJH-UHFFFAOYSA-N

"> SMILES" COC1=C2C(OC)=C3C=COC3=NC2=C(OCC(O)C(C)(C)O)C=C1

> Kingdom Organic compounds

"> Superclass" Organoheterocyclic compounds

"> Class" Quinolines and derivatives

"> Subclass" Furanoquinolines

"> Nodes"

"> Parent" Furanoquinolines

"> Parents" Furopyridines

"> Framework" Aromatic heteropolycyclic compounds

"> Substituents" Furanoquinoline

"> description" This compound belongs to the class of organic compounds known as furanoquinolines. These are compounds containing a furan ring fused to a quinoline.

"> Ancestors" 1,2-diols

> Descriptors

$$$$

I'm interested in organize this data in a new table. The rows related the substances and columns with features. Below are described the features which I´m interested.

Column 1: InChIKey (using only information after "=")

Column 2: SMILES (Only Code)

Column 3: Kingdom (Only text)

Column 4: Superclass (Only text)

Column 5: Class (Only text)

Column 6: Subclass (Only text)

Column 7: Framework (Only text)

Thank you.

  2 件のコメント
Bob Thompson
Bob Thompson 2018 年 4 月 25 日
You can find specific strings or values within an array using commands such as strfind(), find(), and strcmp(). I would suggest importing your data into a cell array and then examining cells for these specific values you're looking for.
Alan Cesar Pilon Miro
Alan Cesar Pilon Miro 2018 年 4 月 26 日
thank you

サインインしてコメントする。

回答 (1 件)

Sarah Palfreyman
Sarah Palfreyman 2018 年 4 月 30 日
You can also use Text Analytics Toolbox for this workflow.

カテゴリ

Help Center および File ExchangeText Data Preparation についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by