メインコンテンツ

localalign

Return local optimal and suboptimal alignments between two sequences

    Description

    AlignStruct = localalign(Seq1,Seq2) returns information about the first optimal (highest scoring) local alignment between two sequences in a MATLAB® structure.

    AlignStruct = localalign(Seq1,Seq2,Name=Value) specifies options using one or more name-value arguments in addition to the arguments in previous syntaxes.

    example

    Examples

    collapse all

    Limit the number of alignments to return between two sequences by specifying the number of alignments.

    Create variables containing two amino acid sequences.

    Seq1 = "VSPAGMASGYDPGKA";
    Seq2 = "IPGKATREYDVSPAG";

    Use the NumAln argument to return information about the top three local alignments.

    struct1 = localalign(Seq1,Seq2,NumAln=3)
    struct1 = struct with fields:
            Score: [3×1 double]
            Start: [3×2 double]
             Stop: [3×2 double]
        Alignment: {3×1 cell}
    
    

    View the scores of the first and second alignments.

    struct1.Score(1:2)
    ans = 2×1
    
       11.0000
        9.6667
    
    

    View the first alignment.

    struct1.Alignment{1}
    ans = 3×5 char array
        'VSPAG'
        '|||||'
        'VSPAG'
    
    

    Limit the number of alignments to return between two sequences by specifying a minimum score.

    Create variables containing two amino acid sequences.

    Seq1 = "VSPAGMASGYDPGKA";
    Seq2 = "IPGKATREYDVSPAG";

    Use MinScore to return information about only local alignments with a score greater than 8. Use DoAlignment to exclude the actual alignments.

    struct2 = localalign(Seq1,Seq2,MinScore=8,DoAlignment=false)
    struct2 = struct with fields:
        Score: [2×1 double]
        Start: [2×2 double]
         Stop: [2×2 double]
    
    

    Limit the number of alignments to return between two sequences by specifying a percentage of the highest score.

    Create variables containing two amino acid sequences.

    Seq1 = "VSPAGMASGYDPGKA";
    Seq2 = "IPGKATREYDVSPAG";

    Use Percent to return information about only local alignments with a score within 15% of the maximum score.

    struct3 = localalign(Seq1,Seq2,Percent=15)
    struct3 = struct with fields:
            Score: [2×1 double]
            Start: [2×2 double]
             Stop: [2×2 double]
        Alignment: {2×1 cell}
    
    

    Specify a scoring matrix and gap opening penalty when aligning two sequences.

    Create variables containing two nucleotide sequences.

    Seq1 = "CCAATCTACTACTGCTTGCAGTAC";
    Seq2 = "AGTCCGAGGGCTACTCTACTGAAC";

    Create a scoring matrix with a match score of 10 and a mismatch score of -9.

    sm = [10 -9 -9 -9;
          -9 10 -9 -9;
          -9 -9 10 -9;
          -9 -9 -9 10];

    Use ScoringMatrix and GapOpen when returning information about the top three local alignments.

    struct4 = localalign(Seq1,Seq2, ...
                         Alphabet="nt", ...
                         ScoringMatrix=sm, ...
                         GapOpen=20, ...
                         NumAln=3)
    struct4 = struct with fields:
            Score: [3×1 double]
            Start: [3×2 double]
             Stop: [3×2 double]
        Alignment: {3×1 cell}
    
    

    Input Arguments

    collapse all

    First amino acid or nucleotide sequence, specified as one of these values:

    For help with letter and integer representations of amino acids and nucleotides, see Amino Acid Lookup or Nucleotide Lookup.

    Data Types: double | char | string | struct

    Second amino acid or nucleotide sequence, specified as one of these values:

    For help with letter and integer representations of amino acids and nucleotides, see Amino Acid Lookup or Nucleotide Lookup.

    Data Types: double | char | string | struct

    Name-Value Arguments

    collapse all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: NumAln=3

    Number of alignments to return, specified as a positive number not exceeding 212. The localalign functions returns the top NumAln local, nonintersecting alignments (optimal and suboptimal). If the number of optimal alignments is greater than the specified number of alignments, then localalign returns the first NumAln alignments based on their order in the trace back matrix.

    Use NumAln to return multiple alignments when you are aligning low complexity sequences and must consider several local alignments.

    Note

    You cannot combine NumAln with MinScore or Percent.

    Data Types: double

    Minimum score of local, nonintersecting alignments (optimal and suboptimal) to return, specified as a positive number. Use MinScore to return suboptimal alignments, for example when you are interested in accounting for sequencing errors or imperfect scoring matrices.

    Note

    You cannot combine MinScore with NumAln or Percent.

    Data Types: double

    Percent of the highest score, specified as a positive number between 0 and 100. This value limits the return of local, nonintersecting alignments (optimal and suboptimal) to those alignments with a score within the specified percentage of the highest score. For example, if the highest score is 10.5 and you specify 5, then localalign determines a minimum score of 10.5 – (10.5 * 0.05) = 9.975. It returns all alignments with a score of 9.975 or higher.

    Use Percent to return optimal and suboptimal alignments when you do not know how similar the two sequences are or how well they score against a given scoring matrix.

    Note

    You cannot combine Percent with NumAln or MinScore.

    Data Types: double

    Indicator to include the pairwise alignments in the Alignment field of the output structure array, specified as true or false.

    Data Types: logical

    Type of sequences, specified as "AA" or "NT".

    Data Types: char | string

    Scoring matrix to use for local alignment, specified as one of these values for amino acid sequences:

    • A string or a corresponding character vector:

      • "BLOSUM62"

      • "BLOSUM30" increasing by 5 up to "BLOSUM90"

      • "BLOSUM100"

      • "PAM10" increasing by 10 up to "PAM500"

      • "DAYHOFF"

      • "GONNET"

      The default values are:

      • "BLOSUM50" — If Alphabet is set to "AA".

      • "NUC44" — If Alphabet is set to "NT".

      The previous scoring matrices, provided with the software, also include a structure containing a scale factor that converts the units of the output score to bits. You can also use the Scale argument to specify an additional scale factor to convert the output score from bits to another unit.

    • A matrix representing the scoring matrix to use for the local alignment, such as returned by the blosum, pam, dayhoff, gonnet, or nuc44 function.

      If you use a scoring matrix that you created or was created by one of the previous functions, the matrix does not include a scale factor. The output score is returned in the same units as the scoring matrix. You can use the Scale argument to specify a scale factor to convert the output score to another unit.

    If you need to compile localalign into a stand-alone application or software component using MATLAB Compiler™, use a matrix instead of a character vector or string for ScoringMatrix.

    Data Types: char | string

    Scale factor applied to the output scores, thereby controlling the units of the output scores, specified as a positive number. For example, if the output score is initially determined in bits, and you specify the scale factor log(2), then localalign returns Score in nats. The default value 1 does not change the units of the output score.

    If the ScoringMatrix argument also specifies a scale factor, then localalign uses it first to scale the output score. It then applies the scale factor specified by Scale to rescale the output score.

    Before comparing alignment scores from multiple alignments, ensure that the scores are in the same units. Use Scale to control the units of the output scores.

    Data Types: double

    Penalty for opening a gap in the alignment, specified as a positive number.

    Data Types: double

    Output Arguments

    collapse all

    Information about the local optimal and suboptimal alignments between two sequences, returned as a MATLAB structure array or array of structure arrays. Each structure array represents an optimal or suboptimal alignment and contains these fields.

    FieldDescription
    Score

    Score for the local optimal or suboptimal alignment.

    Start

    1-by-2 vector of indices indicating the starting point in each sequence for the alignment.

    Stop

    1-by-2 vector of indices indicating the stopping point in each sequence for the alignment.

    Alignment

    3-by-N character array showing the two sequences, Seq1 and Seq2, in the first and third rows. It also shows symbols representing the optimal or suboptimal local alignment between the two sequences in the second row.

    More About

    collapse all

    References

    [1] Barton, G. (1993). An efficient algorithm to locate all locally optimal alignments between two sequences allowing for gaps. CABIOS 9, 729–734.

    Version History

    Introduced in R2009b