templateNaiveBayes

Naive Bayes classifier template

Syntax

t = templateNaiveBayes()
t = templateNaiveBayes(Name,Value)

Description

example

t = templateNaiveBayes() returns a naive Bayes template suitable for training error-correcting output code (ECOC) multiclass models.

If you specify a default template, then the software uses default values for all input arguments during training.

Specify t as a learner in fitcecoc.

example

t = templateNaiveBayes(Name,Value) returns a template with additional options specified by one or more name-value pair arguments. All properties of t are empty, except those you specify using Name,Value pair arguments.

For example, you can specify distributions for the predictors.

If you display t in the Command Window, then all options appear empty ([]), except those that you specify using name-value pair arguments. During training, the software uses default values for empty options.

Examples

collapse all

Use templateNaiveBayes to specify a default naive Bayes template.

t = templateNaiveBayes()
t = 
Fit template for classification NaiveBayes.

    DistributionNames: [1x0 double]
               Kernel: []
              Support: []
                Width: []
              Version: 1
               Method: 'NaiveBayes'
                 Type: 'classification'

All properties of the template object are empty except for Method and Type. When you pass t to the training function, the software fills in the empty properties with their respective default values. For example, the software fills the DistributionNames property with a 1-by- D cell array of character vectors with 'normal' in each cell, where D is the number of predictors. For details on other default values, see fitcnb.

t is a plan for a naive Bayes learner, and no computation occurs when you specify it. You can pass t to fitcecoc to specify naive Bayes binary learners for ECOC multiclass learning.

Create a nondefault naive Bayes template for use in fitcecoc.

Load Fisher's iris data set.

load fisheriris

Create a template for naive Bayes binary classifiers, and specify kernel distributions for all predictors.

t = templateNaiveBayes('DistributionNames','kernel')
t = 
Fit template for classification NaiveBayes.

    DistributionNames: 'kernel'
               Kernel: []
              Support: []
                Width: []
              Version: 1
               Method: 'NaiveBayes'
                 Type: 'classification'

All properties of the template object are empty except for DistributionNames, Method, and Type. When you pass t to the training function, the software fills in the empty properties with their respective default values.

Specify t as a binary learner for an ECOC multiclass model.

Mdl = fitcecoc(meas,species,'Learners',t);

By default, the software trains Mdl using the one-versus-one coding design.

Display the in-sample (resubstitution) misclassification error.

L = resubLoss(Mdl,'LossFun','classiferror')
L = 0.0333

Input Arguments

collapse all

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'DistributionNames','mn' specifies to treat all predictors as token counts for a multinomial model.

Data distributions fitcnb uses to model the data, specified as the comma-separated pair consisting of 'DistributionNames' and a character vector or string scalar, a string array, or a cell array of character vectors with values from this table.

ValueDescription
'kernel'Kernel smoothing density estimate.
'mn'Multinomial distribution. If you specify mn, then all features are components of a multinomial distribution. Therefore, you cannot include 'mn' as an element of a string array or a cell array of character vectors. For details, see Algorithms.
'mvmn'Multivariate multinomial distribution. For details, see Algorithms.
'normal'Normal (Gaussian) distribution.

If you specify a character vector or string scalar, then the software models all the features using that distribution. If you specify a 1-by-P string array or cell array of character vectors, then the software models feature j using the distribution in element j of the array.

By default, the software sets all predictors specified as categorical predictors (using the CategoricalPredictors name-value pair argument) to 'mvmn'. Otherwise, the default distribution is 'normal'.

You must specify that at least one predictor has distribution 'kernel' to additionally specify Kernel, Support, or Width.

Example: 'DistributionNames','mn'

Example: 'DistributionNames',{'kernel','normal','kernel'}

Kernel smoother type, specified as the comma-separated pair consisting of 'Kernel' and a character vector or string scalar, a string array, or a cell array of character vectors.

This table summarizes the available options for setting the kernel smoothing density region. Let I{u} denote the indicator function.

ValueKernelFormula
'box'Box (uniform)

f(x)=0.5I{|x|1}

'epanechnikov'Epanechnikov

f(x)=0.75(1x2)I{|x|1}

'normal'Gaussian

f(x)=12πexp(0.5x2)

'triangle'Triangular

f(x)=(1|x|)I{|x|1}

If you specify a 1-by-P string array or cell array, with each element of the array containing any value in the table, then the software trains the classifier using the kernel smoother type in element j for feature j in X. The software ignores elements of Kernel not corresponding to a predictor whose distribution is 'kernel'.

You must specify that at least one predictor has distribution 'kernel' to additionally specify Kernel, Support, or Width.

Example: 'Kernel',{'epanechnikov','normal'}

Kernel smoothing density support, specified as the comma-separated pair consisting of 'Support' and 'positive', 'unbounded', a string array, a cell array, or a numeric row vector. The software applies the kernel smoothing density to the specified region.

This table summarizes the available options for setting the kernel smoothing density region.

ValueDescription
1-by-2 numeric row vectorFor example, [L,U], where L and U are the finite lower and upper bounds, respectively, for the density support.
'positive'The density support is all positive real values.
'unbounded'The density support is all real values.

If you specify a 1-by-P string array or cell array, with each element in the string array containing any text value in the table and each element in the cell array containing any value in the table, then the software trains the classifier using the kernel support in element j for feature j in X. The software ignores elements of Kernel not corresponding to a predictor whose distribution is 'kernel'.

You must specify that at least one predictor has distribution 'kernel' to additionally specify Kernel, Support, or Width.

Example: 'KSSupport',{[-10,20],'unbounded'}

Data Types: char | string | cell | double

Kernel smoothing window width, specified as the comma-separated pair consisting of 'Width' and a matrix of numeric values, numeric column vector, numeric row vector, or scalar.

Suppose there are K class levels and P predictors. This table summarizes the available options for setting the kernel smoothing window width.

ValueDescription
K-by-P matrix of numeric valuesElement (k,j) specifies the width for predictor j in class k.
K-by-1 numeric column vectorElement k specifies the width for all predictors in class k.
1-by-P numeric row vectorElement j specifies the width in all class levels for predictor j.
scalarSpecifies the bandwidth for all features in all classes.

By default, the software selects a default width automatically for each combination of predictor and class by using a value that is optimal for a Gaussian distribution. If you specify Width and it contains NaNs, then the software selects widths for the elements containing NaNs.

You must specify that at least one predictor has distribution 'kernel' to additionally specify Kernel, Support, or Width.

Example: 'Width',[NaN NaN]

Data Types: double | struct

Output Arguments

collapse all

Naive Bayes classification template suitable for training error-correcting output code (ECOC) multiclass models, returned as a template object. Pass t to fitcecoc to specify how to create the naive Bayes classifier for the ECOC model.

If you display t to the Command Window, then all, unspecified options appear empty ([]). However, the software replaces empty options with their corresponding default values during training.

More About

collapse all

Naive Bayes

Naive Bayes is a classification algorithm that applies density estimation to the data.

The algorithm leverages Bayes theorem, and (naively) assumes that the predictors are conditionally independent, given the class. Though the assumption is usually violated in practice, naive Bayes classifiers tend to yield posterior distributions that are robust to biased class density estimates, particularly where the posterior is 0.5 (the decision boundary) [1].

Naive Bayes classifiers assign observations to the most probable class (in other words, the maximum a posteriori decision rule). Explicitly, the algorithm:

  1. Estimates the densities of the predictors within each class.

  2. Models posterior probabilities according to Bayes rule. That is, for all k = 1,...,K,

    P^(Y=k|X1,..,XP)=π(Y=k)j=1PP(Xj|Y=k)k=1Kπ(Y=k)j=1PP(Xj|Y=k),

    where:

    • Y is the random variable corresponding to the class index of an observation.

    • X1,...,XP are the random predictors of an observation.

    • π(Y=k) is the prior probability that a class index is k.

  3. Classifies an observation by estimating the posterior probability for each class, and then assigns the observation to the class yielding the maximum posterior probability.

If the predictors compose a multinomial distribution, then the posterior probabilityP^(Y=k|X1,..,XP)π(Y=k)Pmn(X1,...,XP|Y=k), where Pmn(X1,...,XP|Y=k) is the probability mass function of a multinomial distribution.

Algorithms

  • If you specify 'DistributionNames','mn' when training Mdl using fitcnb, then the software fits a multinomial distribution using the bag-of-tokens model. The software stores the probability that token j appears in class k in the property DistributionParameters{k,j}. Using additive smoothing [2], the estimated probability is

    P(token j|class k)=1+cj|kP+ck,

    where:

    • cj|k=nki:yiclass kxijwii:yiclass kwi; which is the weighted number of occurrences of token j in class k.

    • nk is the number of observations in class k.

    • wi is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class.

    • ck=j=1Pcj|k; which is the total weighted number of occurrences of all tokens in class k.

  • If you specify 'DistributionNames','mvmn' when training Mdl using fitcnb, then:

    1. For each predictor, the software collects a list of the unique levels, stores the sorted list in CategoricalLevels, and considers each level a bin. Each predictor/class combination is a separate, independent multinomial random variable.

    2. For predictor j in class k, the software counts instances of each categorical level using the list stored in CategoricalLevels{j}.

    3. The software stores the probability that predictor j, in class k, has level L in the property DistributionParameters{k,j}, for all levels in CategoricalLevels{j}. Using additive smoothing [2], the estimated probability is

      P(predictor j=L|class k)=1+mj|k(L)mj+mk,

      where:

      • mj|k(L)=nki:yi class kI{xij=L}wii:yi class kwi; which is the weighted number of observations for which predictor j equals L in class k.

      • nk is the number of observations in class k.

      • I{xij=L}=1 if xij = L, 0 otherwise.

      • wi is the weight for observation i. The software normalizes weights within a class such that they sum to the prior probability for that class.

      • mj is the number of distinct levels in predictor j.

      • mk is the weighted number of observations in class k.

References

[1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.

[2] Manning, C. D., P. Raghavan, and M. Schütze. Introduction to Information Retrieval, NY: Cambridge University Press, 2008.

Introduced in R2014b