Main Content

basecount

Count nucleotides in sequence

Description

NTStruct = basecount(SeqNT) returns the number of each type of base in SeqNT.

example

NTStruct = basecount(SeqNT,Name=Value) uses additional options specified by one or more name-value arguments.

example

Examples

collapse all

Count the bases in a DNA sequence and return the results in a structure.

bases = basecount('TAGCTGGCCAAGCGAGCTTG')
bases = struct with fields:
    A: 4
    C: 5
    G: 7
    T: 4

Get the number of adenosine (A) bases.

bases.A
ans = 
4

Create a bar graph comparing the number of each nucleotide.

basecount('TAGCTGGCCAAGCGAGCTTG',Chart="bar")

Figure contains an axes object. The axes object contains an object of type bar.

ans = struct with fields:
    A: 4
    C: 5
    G: 7
    T: 4

Count the bases in a DNA sequence containing ambiguous characters (R, Y, K, M, S, W, B, D, H, V, or N), listing each of them in a separate field.

basecount('ABCDGGCCAAGCGAGCTTG',Ambiguous="individual")
ans = struct with fields:
    A: 4
    C: 5
    G: 6
    T: 2
    R: 0
    Y: 0
    K: 0
    M: 0
    S: 0
    W: 0
    B: 1
    D: 1
    H: 0
    V: 0
    N: 0

Input Arguments

collapse all

Nucleotide sequence, specified as one of the following.

Example: NTStruct = basecount('CGACTT') counts the number of times of each nucleotide occurs in the sequence.

Data Types: double | char | string | struct

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: NTStruct = basecount("ACGGTC",Ambiguous="individual")

Method for counting ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, and N), specified as one of the following.

  • "ignore"basecount skips ambiguous characters.

  • "bundle"basecount counts ambiguous characters and reports the total count in the Ambiguous field.

  • "prorate"basecount counts ambiguous characters and distributes the total number evenly between all possible unambiguous nucleotide fields. For example, the count for the character R is distributed evenly between the A and G fields.

  • "individual"basecount counts ambiguous characters and reports them in individual fields.

  • "warn"basecount skips ambiguous characters and displays a warning.

Example: NTStruct = basecount("CGRTTMSA",Ambiguous="bundle") reports the total number of ambiguous characters in the Ambiguous field of NTStruct.

Data Types: char | string

Flag to count or ignore gaps, specified as true or false. Gaps are indicated by a hyphen (-).

If you set this option to true, then basecount counts the gaps and reports the total count in the Gaps field.

Data Types: logical

Type of chart to display the proportions of nucleotides, specified as "pie" or "bar".

Data Types: char | string

Output Arguments

collapse all

Nucleotide counts, returned as a structure containing the fields A, C, G, and T. Uracil nucleotides (U) are added to the T field. Additional fields can be present, depending on the value of Ambiguous and Gaps.

Version History

Introduced before R2006a