Main Content

logp

Log unconditional probability density of naive Bayes classification model for incremental learning

Since R2021a

Description

lp = logp(Mdl,X) returns the log unconditional probability densities lp of the observations in the predictor data X using the naive Bayes classification model for incremental learning Mdl. You can use lp to identify outliers in the training data.

example

Examples

collapse all

Train a naive Bayes classification model by using fitcnb, convert it to an incremental learner, and then use the incremental model to detect outliers in streaming data.

Load and Preprocess Data

Load the human activity data set. Randomly shuffle the data.

load humanactivity
rng(1); % For reproducibility
n = numel(actid);
idx = randsample(n,n);
X = feat(idx,:);
Y = actid(idx);

For details on the data set, enter Description at the command line.

Train Naive Bayes Classification Model

Fit a naive Bayes classification model to a random sample of about 25% of the data.

idxtt = randsample([true false false false],n,true);
TTMdl = fitcnb(X(idxtt,:),Y(idxtt))
TTMdl = 
  ClassificationNaiveBayes
              ResponseName: 'Y'
     CategoricalPredictors: []
                ClassNames: [1 2 3 4 5]
            ScoreTransform: 'none'
           NumObservations: 6167
         DistributionNames: {1x60 cell}
    DistributionParameters: {5x60 cell}


TTMdl is a ClassificationNaiveBayes model object representing a traditionally trained model.

Convert Trained Model

Convert the traditionally trained model to a naive Bayes classification model for incremental learning.

IncrementalMdl = incrementalLearner(TTMdl)
IncrementalMdl = 
  incrementalClassificationNaiveBayes

                    IsWarm: 1
                   Metrics: [1x2 table]
                ClassNames: [1 2 3 4 5]
            ScoreTransform: 'none'
         DistributionNames: {1x60 cell}
    DistributionParameters: {5x60 cell}


IncrementalMdl is an incrementalClassificationNaiveBayes object. IncrementalMdl represents a naive Bayes classification model for incremental learning; the parameter values are the same as the parameters in TTMdl.

Detect Outliers

Determine an unconditional density threshold for outliers by using the traditionally trained model and training data. Outliers are observations in the streaming data that yield densities lower than the threshold.

ttlp = logp(TTMdl,X(idxtt,:));
[~,lower] = isoutlier(ttlp)
lower = 
-336.0424

Detect these outliers in the rest of the data. Simulate a data stream by processing 1 observation at a time. At each iteration, call logp to compute the log unconditional probability density of the observation and store each value.

% Preallocation
idxil = ~idxtt;
nil = sum(idxil);
numObsPerChunk = 1;
nchunk = floor(nil/numObsPerChunk);
lp = zeros(nchunk,1);
iso = false(nchunk,1);
Xil = X(idxil,:);
Yil = Y(idxil);

% Incremental processing
for j = 1:nchunk
    ibegin = min(nil,numObsPerChunk*(j-1) + 1);
    iend = min(nil,numObsPerChunk*j);
    idx = ibegin:iend;
    lp(j) = logp(IncrementalMdl,Xil(idx,:));
    iso(j) = lp(j) < lower;
end

Plot the log unconditional probability densities of the streaming data. Identify the outliers.

figure;
h1 = plot(lp);
hold on
x = 1:nchunk;
h2 = plot(x(iso),lp(iso),'r*');
h3 = yline(lower,'g--');
xlim([0 nchunk]);
ylabel('Unconditional Density')
xlabel('Iteration')
legend([h1 h2 h3],["Log unconditional probabilities" "Outliers" "Threshold"])
hold off

Figure contains an axes object. The axes object with xlabel Iteration, ylabel Unconditional Density contains 3 objects of type line, constantline. One or more of the lines displays its values using only markers These objects represent Log unconditional probabilities, Outliers, Threshold.

Input Arguments

collapse all

Naive Bayes classification model for incremental learning, specified as an incrementalClassificationNaiveBayes model object. You can create Mdl directly or by converting a supported, traditionally trained machine learning model using the incrementalLearner function. For more details, see the corresponding reference page.

You must configure Mdl to compute the log conditional probability densities on a batch of observations.

  • If Mdl is a converted, traditionally trained model, you can compute the log conditional probabilities without any modifications.

  • Otherwise, Mdl.DistributionParameters must be a cell matrix with Mdl.NumPredictors > 0 columns and at least one row, where each row corresponds to each class name in Mdl.ClassNames.

Batch of predictor data with which to compute the log conditional probability densities, specified as an n-by-Mdl.NumPredictors floating-point matrix.

For each j = 1 through n, if X(j,:) contains at least one NaN, lp(j) is NaN.

Data Types: single | double

Output Arguments

collapse all

Log unconditional probability densities, returned as an n-by-1 floating-point vector. lp(j) is the log unconditional probability density of the predictors evaluated at X(j,:).

Data Types: single | double

More About

collapse all

Unconditional Probability Density

The unconditional probability density of the predictors is the density's distribution marginalized over the classes.

In other words, the unconditional probability density is

P(X1,..,XP)=k=1KP(X1,..,XP,Y=k)=k=1KP(X1,..,XP|y=k)π(Y=k),

where π(Y = k) is the class prior probability. The conditional distribution of the data given the class (P(X1,..,XP|y = k)) and the class prior probability distributions are training options (that is, you specify them when training the classifier).

Prior Probability

The prior probability of a class is the assumed relative frequency with which observations from that class occur in a population.

Version History

Introduced in R2021a