Main Content

aa2nt

Convert amino acid sequence to nucleotide sequence

Description

ntSeq = aa2nt(aaSeq) converts an amino acid sequence aaSeq to a nucleotide sequence ntSeq using the standard genetic code.

example

ntSeq = aa2nt(aaSeq,Name=Value) uses additional options specified by one or more name-value arguments. For example, ntSeq = aa2nt(aaSeq,GeneticCode=2) uses the vertebrate mitochondrial genetic code.

example

Examples

collapse all

Create an amino acid sequence.

rng('default') % For reproducibility
seq = randseq(20,Alphabet="amino")
seq = 
'TYNYMRQLVVDVVITNHYSV'

Convert it to a nucleotide sequence using the standard genetic code.

aa2nt(seq)
ans = 
'ACATATAACTACATGAGACAGCTTGTAGTTGACGTTGTCATTACTAACCACTATAGCGTT'

Convert it using the vertebrate mitochondrial genetic code.

aa2nt(seq,GeneticCode=2)
ans = 
'ACCTATAACTACATACGCCAACTCGTAGTGGATGTAGTAATTACTAATCACTATTCGGTT'

Convert using the Echinoderm Mitochondrial genetic code and the RNA alphabet.

aa2nt(seq,GeneticCode="ec",Alphabet="RNA")
ans = 
'ACGUAUAACUACAUGCGGCAGUUAGUUGUCGACGUCGUGAUUACGAACCAUUAUAGUGUC'

Input Arguments

collapse all

Amino acid sequence, specified as one of the following.

In general, the mapping from an amino acid to a nucleotide codon is not a one-to-one mapping. For amino acids with multiple possible nucleotide codons, this function randomly selects a codon corresponding to that particular amino acid. For the ambiguous characters B and Z, one of the amino acids corresponding to the letter is selected randomly, and then a codon sequence is selected randomly. For the ambiguous character X, a codon sequence is selected randomly from all possibilities.

Example: "TYNYMRQLVV"

Data Types: char | string

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: ntseq = aa2nt(seq,GeneticCode=2,Alphabet="RNA")

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: ntseq = aa2nt(seq,'GeneticCode',"ec",'Alphabet',"RNA")

Genetic code number or name, specified as a positive integer, character vector, or string scalar. The following table has the list of genetic codes and their corresponding code names.

Genetic Code NumberGenetic Code Name
1Standard
2Vertebrate Mitochondrial
3Yeast Mitochondrial
4Mold, Protozoan, Coelenterate Mitochondrial, and Mycoplasma/Spiroplasma
5Invertebrate Mitochondrial
6Ciliate, Dasycladacean, and Hexamita Nuclear
9Echinoderm Mitochondrial
10Euplotid Nuclear
11Bacterial and Plant Plastid
12Alternative Yeast Nuclear
13Ascidian Mitochondrial
14Flatworm Mitochondrial
15Blepharisma Nuclear
16Chlorophycean Mitochondrial
21Trematode Mitochondrial
22Scenedesmus Obliquus Mitochondrial
23Thraustochytrium Mitochondrial

Tip

If you use a code name, you can truncate the name to the first two letters of the name.

The amino acid to nucleotide codon mapping for the standard genetic code is shown next.

Amino Acid NameAmino Acid CodeNucleotide Codon
Alanine AGCT GCC GCA GCG
ArginineRCGT CGC CGA CGG AGA AGG
AsparagineNAAT AAC
Aspartic acid (Aspartate) DGAT GAC
CysteineCTGT TGC
GlutamineQCAA CAG
Glutamic acid (Glutamate) EGAA GAG
GlycineGGGT GGC GGA GGG
HistidineHCAT CAC
IsoleucineIATT ATC ATA
LeucineL

TTA TTG† CTT CTC CTA CTG†

† indicates an alternative start codon for the standard genetic code as defined here. If you are using nt2aa, alternative start codons are converted to methionine (M) when one of these codons is the first codon of a sequence and AlternativeStartCodons is set to true.

LysineKAAA AAG
MethionineMATG
PhenylalanineFTTT TTC
Proline PCCT CCC CCA CCG
SerineSTCT TCC TCA TCG AGT AGC
ThreonineTACT ACC ACA ACG
TryptophanWTGG
TyrosineYTAT TAC
ValineVGTT GTC GTA GTG
Asparagine or Aspartic acid (Aspartate) B Random codon from D and N
Glutamine or Glutamic acid (Glutamate) ZRandom codon from E and Q
Unknown amino acid (any amino acid) XRandom codon
Translation stop *TAA TAG TGA
Gap of indeterminate length ----
Unknown character (any character or symbol not in table) ????

Data Types: double | char | string

Nucleotide alphabet, specified as "DNA" or "RNA". If "DNA", the function uses A, C, G, and T. If "RNA", the function uses A, C, G, and U.

Data Types: char | string

Output Arguments

collapse all

Nucleotide sequence, returned as a character vector.

Version History

Introduced before R2006a