fastest way of amino acid composition feature extruction using matab? my codes are working fine but need a simplify it further

1 回表示 (過去 30 日間)
% TRAIN TG dataset feature extruction
%% Import the data [~, ~, raw0_0] = xlsread('C:\Users\Amindra\Desktop\EE361\TG dataset-20170922\Train1,taguchi.xlsx','Sheet1','A1:A978'); [~, ~, raw0_1] = xlsread('C:\Users\Amindra\Desktop\EE361\TG dataset-20170922\Train1,taguchi.xlsx','Sheet1','D1:D978'); raw = [raw0_0,raw0_1]; raw(cellfun(@(x) ~isempty(x) && isnumeric(x) && isnan(x),raw)) = {''}; cellVectors = raw(:,[1,2]);
%% Create table Train1taguchi = table;
%% Allocate imported array to column variable names FOLDS = cellVectors(:,1); sequence = cellVectors(:,2);
fprintf('@RELATION TESTtg\n'); fprintf('@ATTRIBUTE one NUMERIC\n'); fprintf('@ATTRIBUTE two NUMERIC\n'); fprintf('@ATTRIBUTE three NUMERIC\n'); fprintf('@ATTRIBUTE four NUMERIC\n'); fprintf('@ATTRIBUTE five NUMERIC\n'); fprintf('@ATTRIBUTE six NUMERIC\n'); fprintf('@ATTRIBUTE seven NUMERIC\n'); fprintf('@ATTRIBUTE eight NUMERIC\n'); fprintf('@ATTRIBUTE nine NUMERIC\n'); fprintf('@ATTRIBUTE ten NUMERIC\n'); fprintf('@ATTRIBUTE eleven NUMERIC\n'); fprintf('@ATTRIBUTE twelve NUMERIC\n'); fprintf('@ATTRIBUTE thirteen NUMERIC\n'); fprintf('@ATTRIBUTE fourteen NUMERIC\n'); fprintf('@ATTRIBUTE fifteen NUMERIC\n'); fprintf('@ATTRIBUTE sixteen NUMERIC\n'); fprintf('@ATTRIBUTE seventeen NUMERIC\n'); fprintf('@ATTRIBUTE eighteen NUMERIC\n'); fprintf('@ATTRIBUTE nineteen NUMERIC\n'); fprintf('@ATTRIBUTE twenty NUMERIC\n'); fprintf('@ATTRIBUTE class {fold1,fold2,fold3,fold4,fold5,fold6,fold7,fold8,fold9,fold10,fold11,fold12,fold13,fold14,fold15,fold16,fold17,fold18,fold19,fold20,fold21,fold22,fold23,fold24,fold25,fold26,fold27,fold28,fold29,fold30}\n'); fprintf('@DATA\n');
for i=1:978 %in this case it is 978 protein sequence FOLDS = cellVectors(i,1); fold=char(FOLDS); sequence = cellVectors(i,2); % call each row of the table seq=char(sequence);% NOTE convert each row of the table to each CHAR %AA=aa2int(seq) AA = aacount(seq); % % count the ALL the # of AA(AMINO ACID)'s in the protein sequence A=AA.A;% count specifically the # of A R=AA.R;% count specifically the # of R N=AA.N;% count specifically the # of N D=AA.D;% count specifically the # of D C=AA.C;% count specifically the # of C Q=AA.Q;% count specifically the # of Q E=AA.E;% count specifically the # of E G=AA.G;% count specifically the # of G H=AA.H;% count specifically the # of H I=AA.I;% count specifically the # of I L=AA.L;% count specifically the # of L's in the protein sequence K=AA.K;% count specifically the # of K's in the protein sequence M=AA.M;% count specifically the # of M's in the protein sequence F=AA.F;% count specifically the # of F's in the protein sequence P=AA.P;% count specifically the # of P's in the protein sequence S=AA.S;% count specifically the # of S's in the protein sequence T=AA.T;% count specifically the # of T's in the protein sequence W=AA.W;% count specifically the # of W's in the protein sequence Y=AA.Y;% count specifically the # of Y's in the protein sequence V=AA.V;% countspecifically the # of V's in the protein sequence lenght = (A+R+N+D+C+Q+E+G+H+I+L+K+M+F+P+S+T+W+Y+V);% length of the protein sequence %fprintf('\nlenght of PROTEIN SEQUENCE = %d\n',lenght) % disply to USER the length of protein sequence
%% FEATURE EXTRACTION f1=(A/lenght); %fprintf('feature A = %d\n',f1) % feature for amino acid A SHIFTED 2 DECIMAL PLACE f2=(R/lenght); %fprintf('feature I = %d\n',f2)% feature for amino acid I SHIFTED 2 DECIMAL PLACE f3=(N/lenght); %fprintf('feature L = %d\n',f3)% feature for amino acid L SHIFTED 2 DECIMAL PLACE f4=(D/lenght); %fprintf('feature M = %d\n',f4)% feature for amino acid M SHIFTED 2 DECIMAL PLACE f5=(C/lenght); %fprintf('feature F = %d\n',f5)% feature for amino acid F SHIFTED 2 DECIMAL PLACE f6=(Q/lenght); %fprintf('feature V = %d\n',f6)% feature for amino acid V SHIFTED 2 DECIMAL PLACE f7=(E/lenght); %fprintf('feature P = %d\n',f7)% feature for amino acid P SHIFTED 2 DECIMAL PLACE f8=(G/lenght); %fprintf('feature G = %d\n',f8)% feature for amino acid G SHIFTED 2 DECIMAL PLACE K+M+F+P+S+T+W+Y+V f9=(H/lenght); %fprintf('feature R = %d\n',f9)% feature for amino acid R SHIFTED 2 DECIMAL PLACE f10=(I/lenght); %fprintf('feature K = %d\n',f10)% feature for amino acid K SHIFTED 2 DECIMAL PLACE f11=(L/lenght); %fprintf('feature D = %d\n',f11)% feature for amino acid D SHIFTED 2 DECIMAL PLACE f12=(K/lenght); %fprintf('feature E = %d\n',f12)% feature for amino acid E SHIFTED 2 DECIMAL PLACE f13=(M/lenght); %fprintf('feature Q = %d\n',f13)% feature for amino acid Q SHIFTED 2 DECIMAL PLACE f14=(F/lenght); %fprintf('feature N = %d\n',f14)% feature for amino acid N SHIFTED 2 DECIMAL PLACE f15=(P/lenght); %fprintf('feature H = %d\n',f15)% feature for amino acid H SHIFTED 2 DECIMAL PLACE f16=(S/lenght); %fprintf('feature S = %d\n',f16)% feature for amino acid S SHIFTED 2 DECIMAL PLACE f17=(T/lenght); %fprintf('feature T = %d\n',f17)% feature for amino acid T SHIFTED 2 DECIMAL PLACE f18=(W/lenght); %fprintf('feature Y = %d\n',f18)% feature for amino acid Y SHIFTED 2 DECIMAL PLACE f19=(Y/lenght); %fprintf('feature C = %d\n',f19)% feature for amino acid C SHIFTED 2 DECIMAL PLACE f20=(V/lenght);
fprintf('%f,',f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f11,f12,f13,f14,f15,f16,f17,f18,f19,f20) fprintf('%s',fold) fprintf('\n%d') end
  3 件のコメント
ASMBHAYA NAND
ASMBHAYA NAND 2018 年 8 月 12 日
NOTE: IF you directory for the datasets is not right the codes will not run(READ MY USER MANUAL for further clarifications)

サインインしてコメントする。

回答 (1 件)

Luuk van Oosten
Luuk van Oosten 2018 年 12 月 20 日
Although your question is poorly formulated, and I agree with Image Analyst about the formatting of your code....
Here is my answer to "fastest way of amino acid composition", as it might help someone else as well:
The "fastest way of amino acid composition" is using the MATLAB function aacount.
for example, lets assume your protein sequence is the following:
yoursequence = 'YURPRTEINSEQENCEYUCANPUTHERE'
you can use
compositionstruct = aacount(yoursequence)
Which will then return you the amino acid composition of your protein sequence in the struct
compositionstruct

カテゴリ

Help Center および File ExchangeProtein and Amino Acid Sequence Analysis についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by