How to read specific lines from a text file and store them in an array?

Question

Rasif Ajwad 2015 年 10 月 20 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/249803-how-to-read-specific-lines-from-a-text-file-and-store-them-in-an-array

コメント済み: Rasif Ajwad 2015 年 10 月 20 日

I have a text file containing an Multiple Sequence Alignment (MSA) which has protein sequences stored in it. The contents of the file is like this:

>gi|73961569|ref|XP_547536.2| osteocalcin [C. lupus familiaris]
MRSLMVLALLAVAALCLCLAGPADAKPSSAESRKGGATFVSKREGSEVVRRLRRYLDSGL
GAPVPYPDPLEPKREVCELNPNCDELADHIGFQEAYQRFYGPV-
>gi|27806301|ref|NP_776674.1| osteocalcin preproprotein
MRTPMLLALLALAT--LCLAGRADAKPGDAESGK-GAAFVSKQEGSEVVKRLRRYLDHWL
GAPAPYPDPLEPKREVCELNPDCDELADHIGFQEAYRRFYGPV-

From this file I just want to extract the lines containing the actual sequences (ones NOT starting with '>' symbol) and store them in an array for future use. One thing to mention is that line 2 and line 3 is one single sequence, so I also need to make them a single string and store it in one single position of an array. How can I do that?

I wanted to use 'fileread' but it reads all the file at a time, so it's not helpful.

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

TastyPastry 2015 年 10 月 20 日

編集済み: TastyPastry 2015 年 10 月 20 日

To clarify, are your sequences supposed to be formatted like this, where there are two separate sequences starting with >gi?

>gi|73961569|ref|XP_547536.2| osteocalcin [C. lupus familiaris]

MRSLMVLALLAVAALCLCLAGPADAKPSSAESRKGGATFVSKREGSEVVRRLRRYLDSGL

GAPVPYPDPLEPKREVCELNPNCDELADHIGFQEAYQRFYGPV-

>gi|27806301|ref|NP_776674.1| osteocalcin preproprotein

MRTPMLLALLALAT--LCLAGRADAKPGDAESGK-GAAFVSKQEGSEVVKRLRRYLDHWL

GAPAPYPDPLEPKREVCELNPDCDELADHIGFQEAYRRFYGPV-

Rasif Ajwad 2015 年 10 月 20 日

Yes. sequences will start with '>gi', but the actual sequence is starting from the next line: 'MRSLM...'

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

per isakson 2015 年 10 月 20 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/249803-how-to-read-specific-lines-from-a-text-file-and-store-them-in-an-array#answer_196766

MATLAB Online で開く

Try

>> out = cssm
out = 
    [1x104 char]    [1x104 char]    [1x104 char]    [1x104 char]    [1x104 char]    [1x104 char]
>> out{3}
ans =
MRSLMVLALLAVAALCLCLAGPADAKPSSAESRKGGATFVSKREGSEVVRRLRRYLDSGLGAPVPYPDPLEPKREVCELNPNCDELADHIGFQEAYQRFYGPV-
>>

where

function    out = cssm
    str = fileread( 'cssm.txt' );
    cac = regexp( str, '(?<=>gi[^\n]+\n).+?(?=\n>gi|$)', 'match' );  
    out = cell(1,length(cac));
      for jj = 1 : length( cac )
          out{jj} = regexprep( cac{jj}, '\n', '' );
      end
  end

and cssm.txt contains three copies of the string of your question.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

How to read specific lines from a text file and store them in an array?

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

How to read specific lines from a text file and store them in an array?

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示