Main Content

CompactClassificationTree

Compact classification tree

Description

Compact version of a classification tree (of class ClassificationTree). The compact version does not include the data for training the classification tree. Therefore, you cannot perform some tasks with a compact classification tree, such as cross validation. Use a compact classification tree for making predictions (classifications) of new data.

Creation

Create a CompactClassificationTree object from a full ClassificationTree model object by using compact.

Properties

expand all

This property is read-only.

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

This property is read-only.

Categorical splits, returned as an n-by-2 cell array, where n is the number of categorical splits in tree. Each row in CategoricalSplit gives left and right values for a categorical split. For each branch node with categorical split j based on a categorical predictor variable z, the left child is chosen if z is in CategoricalSplit(j,1) and the right child is chosen if z is in CategoricalSplit(j,2). The splits are in the same order as nodes of the tree. Nodes for these splits can be found by running cuttype and selecting 'categorical' cuts from top to bottom.

Data Types: cell

This property is read-only.

Numbers of the child nodes for each node in tree, returned as an n-by-2 array containing the numbers of the child nodes for each node in , where n is the number of nodes. Leaf nodes have child node 0.

Data Types: double

This property is read-only.

Class counts for the nodes in tree, returned as an n-by-k array, where n is the number of nodes and k is the number of classes. For any node number i, the class counts ClassCount(i,:) are counts of observations (from the data used in fitting the tree) from each class satisfying the conditions for node i.

Data Types: double

This property is read-only.

List of the elements in Y with duplicates removed, returned as a categorical array, cell array of character vectors, character array, logical vector, or a numeric vector. ClassNames has the same data type as the data in the argument Y. (The software treats string arrays as cell arrays of character vectors.)

Data Types: double | logical | char | cell | categorical

This property is read-only.

Class probabilities for the nodes in tree, returned as an n-by-k array, where n is the number of nodes and k is the number of classes. For any node number i, the class probabilities ClassProbability(i,:) are the estimated probabilities for each class for a point satisfying the conditions for node i.

Data Types: double

Cost of classifying a point into class j when its true class is i, returned as a square matrix. The rows of Cost correspond to the true class and the columns correspond to the predicted class. The order of the rows and columns of Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response.

Data Types: double

This property is read-only.

Categories used at branches in tree, returned as an n-by-2 cell array, where n is the number of nodes. For each branch node i based on a categorical predictor variable X, the left child is chosen if X is among the categories listed in CutCategories{i,1}, and the right child is chosen if X is among those listed in CutCategories{i,2}. Both columns of CutCategories are empty for branch nodes based on continuous predictors and for leaf nodes.

CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories.

Data Types: cell

This property is read-only.

Values used as cut points in tree, returned as an n-element vector, where n is the number of nodes. For each branch node i based on a continuous predictor variable X, the left child is chosen if X<CutPoint(i) and the right child is chosen if X>=CutPoint(i). CutPoint is NaN for branch nodes based on categorical predictors and for leaf nodes.

CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories.

Data Types: double

This property is read-only.

Names of the variables used for branching in each node in tree, returned as an n-element cell array, where n is the number of nodes. These variables are sometimes known as cut variables. For leaf nodes, CutPredictor contains an empty character vector.

CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories.

Data Types: cell

This property is read-only.

Indices of variables used for branching in each node in tree, returned as an n-element array, where n is the number of nodes. For more information, see CutPredictor.

Data Types: double

This property is read-only.

Type of cut at each node in tree, returned as an n-element cell array, where n is the number of nodes. For each node i, CutType{i} is:

  • 'continuous' — If the cut is defined in the form X < v for a variable X and cut point v.

  • 'categorical' — If the cut is defined by whether a variable X takes a value in a set of categories.

  • '' — If i is a leaf node.

CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories.

Data Types: cell

This property is read-only.

Expanded predictor names, returned as a cell array of character vectors.

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames.

Data Types: cell

This property is read-only.

Indicator of branch nodes, returned as an n-element logical vector that is true for each branch node and false for each leaf node of tree.

Data Types: logical

This property is read-only.

Name of most probably class in each node of tree, returned as a cell array with n elements, where n is the number of nodes in the tree. Each element of this array is a character vector equal to one of the class names in ClassNames.

Data Types: cell

This property is read-only.

Misclassification probability for each node in tree, returned as an n-element vector, where n is the number of nodes in the tree.

Data Types: double

This property is read-only.

Proportion of observations in original data that satisfy the conditions for each node in tree, returned as an n-element vector, where n is the number of nodes in the tree. The NodeProbability values are adjusted for any prior probabilities assigned to each class.

Data Types: double

This property is read-only.

Impurity of each node in tree, weighted by the node probability, returned as an n-element vector, where n is the number of nodes in the tree. The measure of impurity is the Gini index or deviance for the node, weighted by the node probability. If the tree is grown by twoing, the risk for each node is zero.

Data Types: double

This property is read-only.

Size of the nodes in tree, returned as an n-element vector, where n is the number of nodes in the tree. The size of a node is the number of observations from the data used to create the tree that satisfy the conditions for the node.

Data Types: double

This property is read-only.

The number of nodes in tree, returned as a positive integer.

Data Types: double

This property is read-only.

Number of parents of each node in tree, returned as an n-element integer vector, where n is the number of nodes in the tree. The parent of the root node is 0.

Data Types: double

This property is read-only.

Predictor names, specified as a cell array of character vectors. The order of the entries in PredictorNames is the same as in the training data.

Data Types: cell

Prior probabilities for each class, returned as an m-element vector, where m is the number of unique classes in the response. The order of the elements of Prior corresponds to the order of the classes in ClassNames.

Data Types: double

Alpha values for pruning the tree, returned as a real vector with one element per pruning level. If the pruning level ranges from 0 to M, then PruneAlpha has M + 1 elements sorted in ascending order. PruneAlpha(1) is for pruning level 0 (no pruning), PruneAlpha(2) is for pruning level 1, and so on.

For the meaning of the ɑ values, see How Decision Trees Create a Pruning Sequence.

Data Types: double

Pruning levels of each node in the tree, returned as an integer vector with NumNodes elements. The pruning levels range from 0 (no pruning) to M, where M is the distance between the deepest leaf and the root node.

For details, see Pruning.

Data Types: double

This property is read-only.

Name of the response variable, returned as a character vector.

Data Types: char

Function for transforming scores, specified as a function handle or the name of a built-in transformation function. 'none' means no transformation; equivalently, 'none' means @(x)x. For a list of built-in transformation functions and the syntax of custom transformation functions, see fitctree.

Add or change a ScoreTransform function using dot notation:

ctree.ScoreTransform = 'function'
% or
ctree.ScoreTransform = @function

Data Types: char | string | function_handle

This property is read-only.

Categories used for surrogate splits, returned as an n-element cell array, where n is the number of nodes in tree. For each node k, SurrogateCutCategories{k} is a cell array. The length of SurrogateCutCategories{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutCategories{k} is either an empty character vector for a continuous surrogate predictor, or is a two-element cell array with categories for a categorical surrogate predictor. The first element of this two-element cell array lists categories assigned to the left child by this surrogate split and the second element of this two-element cell array lists categories assigned to the right child by this surrogate split. The order of the surrogate split variables at each node is matched to the order of variables in SurrogateCutVar. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutCategories contains an empty cell.

Data Types: cell

This property is read-only.

Numeric cut assignments used for surrogate splits in tree, returned as an n-element cell array, where n is the number of nodes in tree. For each node k, SurrogateCutFlip{k} is a numeric vector. The length of SurrogateCutFlip{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutFlip{k} is either zero for a categorical surrogate predictor, or a numeric cut assignment for a continuous surrogate predictor. The numeric cut assignment can be either –1 or +1. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z<C and the cut assignment for this surrogate split is +1, or if ZC and the cut assignment for this surrogate split is –1. Similarly, the right child is chosen if ZC and the cut assignment for this surrogate split is +1, or if Z<C and the cut assignment for this surrogate split is –1. The order of the surrogate split variables at each node is matched to the order of variables in SurrogateCutPredictor. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutFlip contains an empty array.

Data Types: cell

This property is read-only.

Numeric values used for surrogate splits in tree, returned as an n-element cell array, where n is the number of nodes in tree. For each node k, SurrogateCutPoint{k} is a numeric vector. The length of SurrogateCutPoint{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutPoint{k} is either NaN for a categorical surrogate predictor, or a numeric cut for a continuous surrogate predictor. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z<C and SurrogateCutFlip for this surrogate split is +1, or if ZC and SurrogateCutFlip for this surrogate split is –1. Similarly, the right child is chosen if ZC and SurrogateCutFlip for this surrogate split is +1, or if Z<C and SurrogateCutFlip for this surrogate split is –1. The order of the surrogate split variables at each node is matched to the order of variables returned by SurrogateCutPredictor. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutPoint contains an empty cell.

Data Types: cell

This property is read-only.

Names of the variables used for surrogate splits in each node in tree, returned as an n-element cell array, where n is the number of nodes in tree. Every element of SurrogateCutPredictor is a cell array with the names of the surrogate split variables at this node. The variables are sorted by the predictive measure of association with the optimal predictor in the descending order, and only variables with the positive predictive measure are included. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutPredictor contains an empty cell.

Data Types: cell

This property is read-only.

Types of surrogate splits at each node in tree, returned as an n-element cell array, where n is the number of nodes in tree. For each node k, SurrogateCutType{k} is a cell array with the types of the surrogate split variables at this node. The variables are sorted by the predictive measure of association with the optimal predictor in the descending order, and only variables with the positive predictive measure are included. The order of the surrogate split variables at each node is matched to the order of variables in SurrogateCutPredictor. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutType contains an empty cell. A surrogate split type can be either 'continuous' if the cut is defined in the form Z<V for a variable Z and cut point V or 'categorical' if the cut is defined by whether Z takes a value in a set of categories.

Data Types: cell

This property is read-only.

Predictive measures of association for surrogate splits in tree, returned as an n-element cell array, where n is the number of nodes in tree. For each node k, SurrogatePredictorAssociation{k} is a numeric vector. The length of SurrogatePredictorAssociation{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogatePredictorAssociation{k} gives the predictive measure of association between the optimal split and this surrogate split. The order of the surrogate split variables at each node is the order of variables in SurrogateCutPredictor. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogatePredictorAssociation contains an empty cell.

Data Types: cell

Object Functions

compareHoldoutCompare accuracies of two classification models using new data
edgeClassification edge for classification tree model
gatherGather properties of Statistics and Machine Learning Toolbox object from GPU
limeLocal interpretable model-agnostic explanations (LIME)
lossClassification loss for classification tree model
marginClassification margins for classification tree model
nodeVariableRangeRetrieve variable range of decision tree node
partialDependenceCompute partial dependence
plotPartialDependenceCreate partial dependence plot (PDP) and individual conditional expectation (ICE) plots
predictPredict labels using classification tree model
predictorImportanceEstimates of predictor importance for classification tree
shapleyShapley values
surrogateAssociationMean predictive measure of association for surrogate splits in classification tree
updateUpdate model parameters for code generation
viewView classification tree

Examples

collapse all

Construct a compact classification tree for the Fisher iris data.

load fisheriris
tree = fitctree(meas,species);
ctree = compact(tree);

Compare the size of the resulting tree to that of the original tree.

t = whos('tree'); % t.bytes = size of tree in bytes
c = whos('ctree'); % c.bytes = size of ctree in bytes
[c.bytes t.bytes]
ans = 1×2

        5266       11931

The compact tree is smaller than the original tree.

More About

expand all

Extended Capabilities

Version History

Introduced in R2011a