Loop through the DNA array and record all of the locations of the triplets (codons): ‘AAA’, ‘ATC’ and ‘CGG’.

3 ビュー (過去 30 日間)
My code so far is functional, but I don't think that it's correct. I am supposed to loop through the cell array and record the locations of each codon, while skipping over the ones that contain a character from preceding codon. For example, if part of the sequence contains [A,T,C,C,G,G] then the section with CCG should be skipped. I'm just not entirely sure what the best way to do that would be.
Here is what I have so far:
fid = fopen('sequence_long.txt','r')
A = textscan(fid,'%3s');
DNA = A{1};
fclose(fid);
i = 1;
%loops through array and counts codon occurrences
%finds the index location of individual codons
while i < length(DNA)
i = i + 1;
if strcmp(DNA(i),'AAA')
num_AAA = nnz(strcmp(DNA,'AAA'));
loc_AAA = find(strcmp(DNA,'AAA'));
elseif strcmp(DNA(i),'ATC')
num_ATC = nnz(strcmp(DNA,'ATC'));
loc_ATC = find(strcmp(DNA,'ATC'));
elseif strcmp(DNA(i),'CGG')
num_CGG = nnz(strcmp(DNA,'CGG'));
loc_CGG = find(strcmp(DNA,'CGG'));
end
end
fprintf('The number of AAA values is: %.f',num_AAA)
fprintf('The index location of AAA values: %.f\n',loc_AAA(1:10))
fprintf('The number of ATC values is: %.f',num_ATC)
fprintf('The index location of ATC values: %.f\n',loc_ATC(1:10))
fprintf('The number of CGG values is: %.f',num_CGG)
fprintf('The index location of CGG values: %.f\n',loc_CGG(1:10))

採用された回答

Sai Veeramachaneni
Sai Veeramachaneni 2020 年 11 月 17 日
編集済み: Sai Veeramachaneni 2020 年 11 月 17 日
One workaround is to iterate over the sequence and skip the next two characters whenever we find a codon.
You can look at the below code for your reference.
DNA = 'AAATCATCGGCGGATC';%Example sequence
i = 1;
loc_AAA = [];
loc_ATC = [];
loc_CGG = [];
num_AAA = 0;
num_ATC = 0;
num_CGG = 0;
while i <= length(DNA)-2
if DNA(i)=='A' && DNA(i+1)=='A' && DNA(i+2)=='A'
loc_AAA = [loc_AAA i];
num_AAA = num_AAA + 1;
i = i + 3; %Skip the next two characters
elseif DNA(i)=='A' && DNA(i+1)=='T' && DNA(i+2)=='C'
loc_ATC = [loc_ATC i];
num_ATC = num_ATC + 1;
i = i + 3;
elseif DNA(i)=='C' && DNA(i+1)=='G' && DNA(i+2)=='G'
loc_CGG = [loc_CGG i];
num_CGG = num_CGG + 1;
i = i + 3;
else
i = i + 1;
end
end

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeGenomics and Next Generation Sequencing についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by