Main Content

cleave

Cleave amino acid sequence with enzyme

Syntax

Fragments = cleave(SeqAA, Enzyme)
Fragments = cleave(SeqAA, PeptidePattern, Position)
[Fragments, CuttingSites] = cleave(...)
[Fragments, CuttingSites, Lengths] = cleave(...)
[Fragments, CuttingSites, Lengths, Missed] = cleave(...)
cleave(..., 'PartialDigest', PartialDigestValue, ...)
cleave(..., 'MissedSites', MissedSitesValue, ...)
cleave(..., 'Exception', ExceptionValue, ...)

Input Arguments

SeqAA

One of the following:

  • Character vector or string containing single-letter codes specifying an amino acid sequence.

  • Row vector of integers specifying an amino acid sequence.

  • MATLAB® structure containing a Sequence field that contains an amino acid sequence, such as returned by fastaread, getgenpept, genpeptread, getpdb, or pdbread.

Examples: 'ARN' or [1 2 3].

Enzyme

Character vector or string specifying a name or abbreviation code for an enzyme or compound for which the literature specifies a cleavage rule.

Tip

Use the cleavelookup function to display the names of enzymes and compounds in the cleavage rule library.

PeptidePattern

Short amino acid sequence to search for in SeqAA, a larger sequence. PeptidePattern can be any of the following:

Position

Integer from 0 to the length of the PeptidePattern, that specifies a position in the PeptidePattern to cleave.

Note

Position 0 corresponds to the N terminal end of PeptidePattern.

PartialDigestValue

Value from 0 to 1 (default) specifying the probability that a cleavage site will be cleaved.

MissedSitesValue

Nonnegative integer specifying the maximum number of missed cleavage sites. The output includes all possible peptide fragments that can result from missing MissedSitesValue or less cleavage sites. Default is 0, which is equivalent to an ideal digestion.

ExceptionValue

Regular expression specifying an exception rule to the cleavage rule associated with Enzyme. By default, exception rules are only applied in the case of trypsin, and all other enzymes have no exception rule, which is specified as an empty character vector. To prevent the use of the default exceptions for trypsin, use an empty character vector as the exception rule.

To see the regular expression for trypsin’s exception rules, check the Cleave Lookup table.

Output Arguments

Fragments

Cell array of character vectors representing the fragments from the cleavage.

CuttingSites

Numeric vector containing indices representing the cleavage sites.

Note

The cleave function adds a 0 to the list, so numel(CuttingSites)==numel(Fragments). Use CuttingSites + 1 to point to the first amino acid of every fragment respective to the original sequence.

Lengths

Numeric vector containing the length of each fragment.

Missed

Numeric vector containing the number of missed cleavage sites for every peptide fragment.

Description

Fragments = cleave(SeqAA, Enzyme) cuts SeqAA, an amino acid sequence, into parts at the cleavage sites specific for Enzyme, a character vector or string specifying a name or abbreviation code for an enzyme or compound for which the literature specifies a cleavage rule. It returns Fragments, a cell array of character vectors representing the fragments from the cleavage.

Tip

Use the cleavelookup function to display the names of enzymes and compounds in the cleavage rule library.

Fragments = cleave(SeqAA, PeptidePattern, Position) cuts SeqAA, an amino acid sequence, into parts at the cleavage sites specified by a peptide pattern and position.

[Fragments, CuttingSites] = cleave(...) returns a numeric vector containing indices representing the cleavage sites.

Note

The cleave function adds a 0 to the list, so numel(CuttingSites)==numel(Fragments). Use CuttingSites + 1 to point to the first amino acid of every fragment respective to the original sequence.

[Fragments, CuttingSites, Lengths] = cleave(...) returns a numeric vector containing the length of each fragment.

[Fragments, CuttingSites, Lengths, Missed] = cleave(...) returns a numeric vector containing the number of missed cleavage sites for every fragment.

cleave(..., 'PropertyName', PropertyValue, ...) calls cleave with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Enclose each PropertyName in single quotation marks. Each PropertyName is case insensitive. These property name/property value pairs are as follows:

cleave(..., 'PartialDigest', PartialDigestValue, ...) simulates a partial digestion where PartialDigestValue is the probability of a cleavage site being cut. PartialDigestValue is a value from 0 to 1 (default).

This table lists some common proteases and their cleavage sites.

ProteasePeptide PatternPosition
Aspartic acid ND 1
Chymotrypsin[WYF](?!P)1
Glutamine C[ED](?!P) 1
Lysine C[K](?!P) 1
Trypsin[KR](?!P)1

cleave(..., 'MissedSites', MissedSitesValue, ...) returns all possible peptide fragments that can result from missing MissedSitesValue or less cleavage sites. MissedSitesValue is a nonnegative integer. Default is 0, which is equivalent to an ideal digestion.

cleave(..., 'Exception', ExceptionValue, ...) specifies an exception rule to the cleavage rule associated with Enzyme. ExceptionValue is a regular expression. By default, exception rules are only applied in the case of trypsin, and all other enzymes have no exception rule, which is specified as an empty character vector. To prevent the use of the default exceptions for trypsin, specify an empty character vector as the exception rule.

Examples

collapse all

This example shows how to cleave a sequence using trypsin.

Retrieve a protein sequence from the GenPept database.

S = getgenpept('AAA59174');

Cleave the sequence using trypsin's cleavage rules and all known exceptions.

parts = cleave(S.Sequence,'trypsin');

Display the first ten fragments.

parts(1:10)
ans = 

    'MGTGGR'
    'R'
    'GAAAAPLLVAVAALLLGAAGHLYPGEVCPGMDIR'
    'NNLTR'
    'LHELENCSVIEGHLQILLMFK'
    'TRPEDFR'
    'DLSFPK'
    'LIMITDYLLLFR'
    'VYGLESLK'
    'DLFPNLTVIR'

Cleave the sequence using trypsin's cleavage rules and a single specific exception rule.

parts = cleave(S.Sequence,'trypsin','exception','KD');
parts(1:10)
ans = 

    'MGTGGR'
    'R'
    'GAAAAPLLVAVAALLLGAAGHLYPGEVCPGMDIR'
    'NNLTR'
    'LHELENCSVIEGHLQILLMFK'
    'TRPEDFR'
    'DLSFPK'
    'LIMITDYLLLFR'
    'VYGLESLKDLFPNLTVIR'
    'GSR'

Cleave the sequence using one of trypsin's cleavage rules, which is to cleave after K or R when the next residue is not P.

[parts, sites, lengths] = cleave(S.Sequence,'[KR](?!P)',1);
for i = 1:10
    fprintf('%5d%5d   %s\n',sites(i),lengths(i),parts{i})
end
    0    6   MGTGGR
    6    1   R
    7   34   GAAAAPLLVAVAALLLGAAGHLYPGEVCPGMDIR
   41    5   NNLTR
   46   21   LHELENCSVIEGHLQILLMFK
   67    7   TRPEDFR
   74    6   DLSFPK
   80   12   LIMITDYLLLFR
   92    8   VYGLESLK
  100   10   DLFPNLTVIR

Cut the sequence using trypsin, allowing for 1 missed cleavage site.

[parts2, sites2, lengths2, missed] = cleave(S.Sequence,'trypsin','missedsites',1);

Display the first 10 fragments that have 1 missed cleavage site.

idx = find(missed);
for i = 1:10
    fprintf('%5d%5d   %s\n',sites2(idx(i)),lengths2(idx(i)),parts2{idx(i)})
end
    0    7   MGTGGRR
    6   35   RGAAAAPLLVAVAALLLGAAGHLYPGEVCPGMDIR
    7   39   GAAAAPLLVAVAALLLGAAGHLYPGEVCPGMDIRNNLTR
   41   26   NNLTRLHELENCSVIEGHLQILLMFK
   46   28   LHELENCSVIEGHLQILLMFKTRPEDFR
   67   13   TRPEDFRDLSFPK
   74   18   DLSFPKLIMITDYLLLFR
   80   20   LIMITDYLLLFRVYGLESLK
   92   18   VYGLESLKDLFPNLTVIR
  100   13   DLFPNLTVIRGSR

Version History

Introduced before R2006a