multialign
Align multiple sequences using progressive method
Syntax
SeqsMultiAligned
= multialign(Seqs
)
SeqsMultiAligned
= multialign(Seqs
, Tree
)
multialign(..., 'PropertyName
', PropertyValue,
...)
multialign(..., 'Weights', WeightsValue
)
multialign(..., 'ScoringMatrix', ScoringMatrixValue
)
multialign(..., 'SMInterp', SMInterpValue
)
multialign(..., 'GapOpen', GapOpenValue
)
multialign(..., 'ExtendGap', ExtendGapValue
)
multialign(..., 'DelayCutoff', DelayCutoffValue
)
multialign(..., 'UseParallel', UseParallelValue
)
multialign(..., 'Verbose', VerboseValue
)
multialign(..., 'ExistingGapAdjust', ExistingGapAdjustValue
)
multialign(..., 'TerminalGapAdjust', TerminalGapAdjustValue
)
Input Arguments
Seqs | Vector of structures with the fields
|
Tree | Phylogenetic tree calculated with the seqlinkage or seqneighjoin function. |
WeightsValue | Property to select the sequence weighting
method. Enter 'THG' (default) or 'equal' . |
ScoringMatrixValue | Either of the following:
Note If you need to compile |
SMInterpValue | Property to specify whether linear interpolation of the scoring
matrices is on or off. When false , the scoring matrix is assigned
to a fixed range depending on the distances between the two profiles
(or sequences) being aligned. Default is true . |
GapOpenValue | Scalar or a function specified using @ .
If you enter a function, multialign passes four
values to the function: the average score for two matched residues
(sm ), the average score for two mismatched residues
(sx ), and, the length of both profiles or sequences
(len1 , len2 ). Default is @(sm,sx,len1,len2)
5*sm . |
ExtendGapValue | Scalar or a function specified using @ .
If you enter a function, multiialign passes four
values to the function: the average score for two matched residues
(sm ), the average score for two mismatched residues
(sx ), and the length of both profiles or sequences
(len1 , len2 ). Default is @(sm,sx,len1,len2)
sm/4 . |
DelayCutoffValue | Property to specify the threshold delay of divergent sequences. Default is unity where sequences with the closest sequence farther than the median distance are delayed. |
UseParallelValue | Controls the computation of the pairwise alignments using parfor -loops.
When true , and Parallel Computing Toolbox™ is
installed and a parpool is open, computation occurs
in parallel. If there are no open parpool , but
automatic creation is enabled in the Parallel Preferences, the default
pool will be automatically open and computation occurs in parallel.
If Parallel Computing Toolbox is installed, but there are no open parpool and
automatic creation is disabled, then computation uses parfor -loops
in serial mode. If Parallel Computing Toolbox is not installed,
then computation uses parfor -loops in serial mode.
Default is false , which uses for-loops in serial
mode. |
VerboseValue | Property to control displaying the sequences
with sequence information. Default is false . |
ExistingGapAdjustValue | Property to control automatic adjustment
based on existing gaps. Default is true . |
TerminalGapAdjustValue | Property to adjust the penalty for opening
a gap at the ends of the sequence. Default is false . |
Output Arguments
| Vector of structures (same as When |
Description
performs
a progressive multiple alignment for a set of sequences (SeqsMultiAligned
= multialign(Seqs
)Seqs
).
Pairwise distances between sequences are computed after pairwise alignment
with the Gonnet scoring matrix and then by counting the proportion
of sites at which each pair of sequences are different (ignoring gaps).
The guide tree is calculated by the neighbor-joining method assuming
equal variance and independence of evolutionary distance estimates.
uses
a tree (SeqsMultiAligned
= multialign(Seqs
, Tree
)Tree
) as a guide for the progressive
alignment. The sequences (Seqs
) should
have the same order as the leaves in the tree (Tree
)
or use a field ('Header'
or 'Name'
)
to identify the sequences.
multialign(..., '
enters
optional arguments as property name/property value pairs. Specify
one or more properties in any order. Enclose each PropertyName
', PropertyValue,
...)PropertyName
in
single quotation marks. Each PropertyName
is
case insensitive. These property name/property value pairs are as
follows:
multialign(..., 'Weights',
selects
the sequence weighting method. Weights emphasize highly divergent
sequences by scaling the scoring matrix and gap penalties. Closer
sequences receive smaller weights. WeightsValue
)
Values of the property Weights
are:
'THG'
(default) — Thompson-Higgins-Gibson method using the phylogenetic tree branch distances weighted by their thickness.'equal'
— Assigns the same weight to every sequence.
multialign(..., 'ScoringMatrix',
selects
the scoring matrix (ScoringMatrixValue
)ScoringMatrixValue
)
for the progressive alignment. Match and mismatch scores are interpolated
from the series of scoring matrices by considering the distances between
the two profiles or sequences being aligned. The first matrix corresponds
to the smallest distance, and the last matrix to the largest distance.
Intermediate distances are calculated using linear interpolation.
multialign(..., 'SMInterp',
,
when SMInterpValue
)SMInterpValue
is false
,
turns off the linear interpolation of the scoring matrices. Instead,
each supplied scoring matrix is assigned to a fixed range depending
on the distances between the two profiles or sequences being aligned.
multialign(..., 'GapOpen',
specifies
the initial penalty for opening a gap. GapOpenValue
)
multialign(..., 'ExtendGap',
specifies
the initial penalty for extending a gap. ExtendGapValue
)
multialign(..., 'DelayCutoff',
specifies
a threshold to delay the alignment of divergent sequences whose closest
neighbor is farther than DelayCutoffValue
)
(DelayCutoffValue) * (median patristic distance between sequences)
multialign(..., 'UseParallel',
specifies
whether to use UseParallelValue
)parfor
-loops when computing the
pairwise alignments. When true
, and Parallel Computing Toolbox is
installed and a parpool
is open, computation occurs
in parallel. If there are no open parpool
, but
automatic creation is enabled in the Parallel Preferences, the default
pool will be automatically open and computation occurs in parallel.
If Parallel Computing Toolbox is installed, but there are no open parpool
and
automatic creation is disabled, then computation uses parfor
-loops
in serial mode. If Parallel Computing Toolbox is not installed,
then computation uses parfor
-loops in serial mode.
Default is false
, which uses for-loops in serial
mode.
multialign(..., 'Verbose',
,
when VerboseValue
)VerboseValue
is true
,
turns on verbosity.
The remaining input optional arguments are analogous to the
function profalign
and
are used through every step of the progressive alignment of profiles.
multialign(..., 'ExistingGapAdjust',
,
when ExistingGapAdjustValue
)ExistingGapAdjustValue
is false
,
turns off the automatic adjustment based on existing gaps of the position-specific
penalties for opening a gap.
When ExistingGapAdjustValue
is true
,
for every profile position, profalign
proportionally
lowers the penalty for opening a gap toward the penalty of extending
a gap based on the proportion of gaps found in the contiguous symbols
and on the weight of the input profile.
multialign(..., 'TerminalGapAdjust',
,
when TerminalGapAdjustValue
)TerminalGapAdjustValue
is true
,
adjusts the penalty for opening a gap at the ends of the sequence to be equal to the
penalty for extending a
gap.
Examples
Align multiple sequences
This example shows how to align multiple protein sequences.
Use the fastaread
function to read p53samples.txt, a FASTA-formatted file included with Bioinformatics Toolbox™, which contains p53 protein sequences of seven species.
p53 = fastaread('p53samples.txt')
p53=7×1 struct array with fields:
Header
Sequence
Compute the pairwise distances between each pair of sequences using the 'GONNET' scoring matrix.
dist = seqpdist(p53,'ScoringMatrix','GONNET');
Build a phylogenetic tree using an unweighted average distance (UPGMA) method. This tree will be used as a guiding tree in the next step of progressive alignment.
tree = seqlinkage(dist,'average',p53)
Phylogenetic tree object with 7 leaves (6 branches)
Perform progressive alignment using the PAM family scoring matrices.
ma = multialign(p53,tree,'ScoringMatrix',... {'pam150','pam200','pam250'})
ma=7×1 struct array with fields:
Header
Sequence
Align Nucleotide Sequences
Enter an array of sequences.
seqs = {'CACGTAACATCTC','ACGACGTAACATCTTCT','AAACGTAACATCTCGC'};
Promote terminations with gaps in the alignment.
multialign(seqs,'terminalGapAdjust',true) ans = --CACGTAACATCTC-- ACGACGTAACATCTTCT -AAACGTAACATCTCGC
Compare the alignment without termination gap adjustment.
multialign(seqs) ans = CA--CGTAACATCT--C ACGACGTAACATCTTCT AA-ACGTAACATCTCGC
Extended Capabilities
Version History
Introduced before R2006a
See Also
align2cigar
| hmmprofalign
| multialignread
| multialignwrite
| nwalign
| profalign
| seqprofile
| seqconsensus
| seqneighjoin