フィルターのクリア

Random amino acid sequence generation with a given amino acid count of a specified sequence

2 ビュー (過去 30 日間)
I have a sequence
>sp|Q7RTX1|TS1R1_HUMAN Taste receptor type 1 member 1 OS=Homo sapiens OX=9606 GN=TAS1R1 PE=2 SV=1
MLLCTARLVGLQLLISCCWAFACHSTESSPDFTLPGDYLLAGLFPLHSGCLQVRHRPEVT
LCDRSCSFNEHGYHLFQAMRLGVEEINNSTALLPNITLGYQLYDVCSDSANVYATLRVLS
LPGQHHIELQGDLLHYSPTVLAVIGPDSTNRAATTAALLSPFLVPMISYAASSETLSVKR
QYPSFLRTIPNDKYQVETMVLLLQKFGWTWISLVGSSDDYGQLGVQALENQATGQGICIA
FKDIMPFSAQVGDERMQCLMRHLAQAGATVVVVFSSRQLARVFFESVVLTNLTGKVWVAS
EAWALSRHITGVPGIQRIGMVLGVAIQKRAVPGLKAFEEAYARADKKAPRPCHKGSWCSS
NQLCRECQAFMAHTMPKLKAFSMSSAYNAYRAVYAVAHGLHQLLGCASGACSRGRVYPWQ
LLEQIHKVHFLLHKDTVAFNDNRDPLSSYNIIAWDWNGPKWTFTVLGSSTWSPVQLNINE
TKIQWHGKDNQVPKSVCSSDCLEGHQRVVTGFHHCCFECVPCGAGTFLNKSDLYRCQPCG
KEEWAPEGSQTCFPRTVVFLALREHTSWVLLAANTLLLLLLLGTAGLFAWHLDTPVVRSA
GGRLCFLMLGSLAAGSGSLYGFFGEPTRPACLLRQALFALGFTIFLSCLTVRSFQLIIIF
KFSTKVPTFYHAWVQNHGAGLFVMISSAAQLLICLTWLVVWTPLPAREYQRFPHLVMLEC
TETNSLGFILAFLYNGLLSISAFACSYLGKDLPENYNEAKCVTFSLLFNFVSWIAFFTTA
SVYDGKYLPAANMMAGLSSLSSGFGGYFLPKCYVILCRPDLNSTEHFQASIQDYTRRCGS
T
I wish to get a set of 10000 sequences (in fasta format) having identical amino acid counts as in the above sequence.
I could not use properly the randseq function in getting what I need.
Any help would be highly appreciated.

採用された回答

Tim DeFreitas
Tim DeFreitas 2022 年 6 月 8 日
編集済み: Tim DeFreitas 2022 年 6 月 22 日
If you want exactly the same amino acid counts, then you want to randomly shuffle the input sequence, which can be done with randperm:
[~, sequences] = fastaread('pf00002.fa');
targetSeq = sequences{1}; % Select specific sequence from FA file.
randomSeqs = cell(1,10000);
for i=1:numel(randomSeqs)
randomSeqs{i} = targetSeq(:, randperm(numel(targetSeq)));
end

その他の回答 (1 件)

Sam Chak
Sam Chak 2022 年 6 月 8 日
I'm no expert in this, but it is possible to generate a sequence of numbers that associate with the Roman alphabet.
Here is a simple script that you can modify to generate a long sequence of characters like the PASTA spaghetti form.
function amino = generateAmino(n)
ASCII_L = 65;
ASCII_U = 90;
C = round((ASCII_U - ASCII_L).*rand(n, 1) + ASCII_L);
amino = char(C');
end
On the Command Window:
S1 = generateAmino(60)
S1 =
'TGNRWYODEGVGUGXJFGPMJVPOXHTTKOCBNTXDOMAIEUINEPHQRTLCGXEVNZCL'
You can then use a for loop to generate the number of sequences that you want.
numOfSeq = 10000;
for i = 1:numOfSeq
S(i,:) = generateAmino(60);
end
S
That's the basic idea. You may need to modify the script to select certain Roman alphabets in the ASCII chart.

カテゴリ

Help Center および File ExchangeProtein and Amino Acid Sequence Analysis についてさらに検索

製品


リリース

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by