Read text file lines and analyze

Question

Lmm3 2017 年 7 月 24 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/349955-read-text-file-lines-and-analyze

回答済み: OCDER 2017 年 9 月 9 日

採用された回答: Lmm3

MATLAB Online で開く

I would appreciate help with reading and analyzing a text file. The text file (rosalind_gc1.txt) is in this format:

>Rosalind_4949

ACTTCTATGTAGCGCGCTATTTCAAGGGATCGGCCAATAGTACGACGTGTTTCATCTAGT GCGACAAATGTATATACCGTTTTCATTACGTACCACGATAAGTTGAAGCCCGTATTC AGACGCGGGAGCCGTCTGCTGGACAAGTACTAGCTGGTCCATCCTCCCCACCAAAGGGAA

>Rosalind_7490

AACTGGGAATTTCTATATTGGGCGGTAAGCTCGGGGCAATCTATTAGTTGAATGCAACAG TAACAAACTTGCCGTCGGTCGCTGTTCGCGCAGCATTAATAATAACTCTGGCGAGTAGAT

>Rosalind_8337

CCTTGTTGTCTACCCACCAAGTCAGATAGACAGTTGGCTGTCTCCAACGCAGATTTTCTA CGCTTCATGCTCTTGCGACTCATGTCGCCTGGGTTTATTGCTTCTCTACGGGATAACCGC CCGGGCTCACTCTACCCGCGGGAAGGCCGCCCTCTCTCCCGTGTGCCTACATAA

I would like to determine the %GC for the data sets between each “>Rosalind” heading. For example, in the example above there are 3 data sets. The %GC for the text between “>Rosalind_4949” and “>Rosalind_7490” is 48.5876% and between “>Rosalind_7490” and “>Rosalind_8337” is 45.000%.

I’m trying to use the following code but I don’t know how to read the lines as blocks between each “>” and I don’t know how to concatenate the lines as I read them. I would appreciate any help.

fid = fopen('rosalind_gc1.txt');
while ~feof(fid)
    templine = fgetl(fid);
    a = strcmp(templine, '>');
    if a == 0
        G = length(strfind(templine,'G'));
        C = length(strfind(templine,'C'));
        z = length(templine);
        %Per = (G+C)*100/z
    end
end
    Per = (G+C)*100/z

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Lmm3 2017 年 9 月 9 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/349955-read-text-file-lines-and-analyze#answer_280864

MATLAB Online で開く

The following code is what I used to read from the data file and determine %GC:

fid = fopen('rosalind_gc.txt');
n = 1;
G = 0;
C = 0;
z = 1;
while ~feof(fid)
    templine = fgetl(fid);
    a = strfind(templine, '>');
    TF = isempty(a);
    if TF == 1;
        n= n+1;
        G(1) = 0;
        C(1) = 0;
        z(1) = 0;
        G(n) = length(strfind(templine,'G'));
        C(n) = length(strfind(templine,'C'));
        z(n) = length(templine);
          G(n) = G(n) + G(n-1);
          C(n) = C(n) + C(n-1);
          z(n) = z(n) + z(n-1);
          continue
         % Per(n) = (G(n)+C(n))*100/z(n)
      else TF == 0 ;
          Per = (G(end)+C(end))*100/z(end)
          disp(templine)
          G(:,:) = [];
          C(:,:) = [];
          z (:,:)=[];
          continue
      end
  end
  Per =(G(end)+C(end))*100/z(end)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

KSSV 2017 年 7 月 24 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/349955-read-text-file-lines-and-analyze#answer_275272

編集済み: KSSV 2017 年 7 月 24 日

MATLAB Online で開く

Let data.txt be your text file...You can count the number of G in your file as below:

fid = fopen('data.txt') ;
S = textscan(fid,'%s','delimiter','\n') ;
fclose(fid) ;
S = S{1} ;
N = 0 ;
for i = 1:length(S)
    N = N+length(strfind(S{i}, 'G'));
end

Without loop :

fid = fopen('data.txt') ;
  S = textscan(fid,'%s','delimiter','\n') ;
  fclose(fid) ;
  S = S{1} ;
Ni = strfind(S,'G') ;
N = sum(cellfun(@numel,Ni)) ;

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Lmm3 2017 年 7 月 25 日

KSSV thank you for your response. Could you explain to me what the line S = S{1} is doing? The code returns the total number of "G" occurrences for the data file, but do you have a suggestion how to get the "G" occurrences between each of the headers that begin with ">Rosalind"? For example, in the data set above, I would like to get 3 values, the number of G occurrences between (“>Rosalind_4949” and “>Rosalind_7490”) between (“>Rosalind_7490” and “>Rosalind_8337”) and G occurrences below (">Rosalind_8337).

サインインしてコメントする。

Answer 3

OCDER 2017 年 9 月 9 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/349955-read-text-file-lines-and-analyze#answer_280878

MATLAB Online で開く

readFasta.m

If you deal with a lot of fasta files, look into fastaread (Matlab Bioinformatics Toolbox) or readFasta (a code I made for another project).

Also, cellfun and regexp become pretty handy tools.

To get GC %:

[Header, Seq] = readFasta('Seq.txt');
PercGC = cellfun(@(S)length(regexpi(S, 'G|C'))/length(S)*100, Seq);
PercGC =
   48.5876
   45.0000
   55.1724

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Read text file lines and analyze

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (2 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Read text file lines and analyze

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (2 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示