メインコンテンツ

Isolation Forest Anomaly Detector

Find anomalies in data using isolation forest

Since R2026a

  • Isolation Forest Anomaly Detector Block Icon

Libraries:
Statistics and Machine Learning Toolbox / Anomaly Detection

Description

The Isolation Forest Anomaly Detector block finds anomalies in data using a trained isolation forest model object.

Import an initialized isolation forest model object (IsolationForest) into the block by specifying the name of a workspace variable that contains the object. The input port x receives a data stream for which anomalies are evaluated. The output port IsAnomaly returns values indicating if anomalies are detected, and the output port score returns the anomaly scores.

Examples

expand all

Train an isolation forest model for anomaly detection using the iforest function. Then detect anomalies in new data by passing the new data to the Isolation Forest Anomaly Detector block.

Load the sample data set NYCHousing2015.

load NYCHousing2015

The data set includes 10 variables with information on the sales of properties in New York City in 2015. Remove the NEIGHBORHOOD variable, which has 254 categories, and the BUILDINGCLASSCATEGORY variable, which is not numeric.

NYCHousing2015.NEIGHBORHOOD = [];
NYCHousing2015.BUILDINGCLASSCATEGORY = [];

The SALEDATE variable is a datetime array, which is not supported by iforest. Create variables for the month and day numbers of the datetime values, and then delete the SALEDATE variable.

[~,NYCHousing2015.MM,NYCHousing2015.DD] = ymd(NYCHousing2015.SALEDATE);
NYCHousing2015.SALEDATE = [];

Shuffle the order of the observations and convert the table into a matrix.

rng(0,"twister") % For reproducibility
NYCHousing2015 = NYCHousing2015(randperm(height(NYCHousing2015)),:);
NYCHousing2015 = table2array(NYCHousing2015);

Create a training data set using the first 40,000 observations, and a test data set using the next 5000 observations.

n_train = 40000;
n_test = 5000;
X_train = NYCHousing2015(1:n_train,:);
X_test = NYCHousing2015(n_train+1:n_train+n_test,:);

Train an isolation forest model using the training data. Assume that the outlier fraction in the training data is 0.5%. Because the first variable (BOROUGH) is not ordinal, treat it as categorical.

[ifMdl,tf,scores_train] = iforest(X_train,ContaminationFraction=0.005,CategoricalPredictors=1);

ifMdl is an IsolationForest model object. The iforest function also returns the anomaly indicators tf and anomaly scores scores_train for the training data.

Plot a histogram of the anomaly score values. Create a vertical dashed line at the score threshold corresponding to the specified outlier fraction.

histogram(scores_train)
xlabel("Anomaly Score")
xline(ifMdl.ScoreThreshold,"r--",["Threshold" ifMdl.ScoreThreshold])

Figure contains an axes object. The axes object with xlabel Anomaly Score contains 2 objects of type histogram, constantline.

Convert the test data into a time series object to load into the Simulink model.

t = 0:size(X_test,1)-1;
X_ts = timeseries(X_test,t,InterpretSingleRowDataAs3D=true);

This example provides the Simulink model slexIFAnomalyDetector.slx, shown in the figure below. The model is configured to use ifMdl as the initial model for the Isolation Forest Anomaly Detector block. You can double-click the block to access the Block Parameters dialog box.

slName = "slexIFAnomalyDetector";
open_system(slName);

Block diagram showing the Simulink model.

Simulate the Simulink model to perform anomaly detection on the test data. At each iteration, the Isolation Forest Anomaly Detector block flags an observation as an anomaly if its anomaly score is above the threshold.

simOut = sim(slName,"StopTime",num2str(numel(t)-1));

Export the simulation outputs to the workspace. You can use the Simulation Data Inspector (Simulink) to view the logged data of an Outport block.

% Extract IsAnomaly values
IsAnomaly_sig = simOut.yout.getElement(1);
IsAnomaly_sl = squeeze(IsAnomaly_sig.Values.Data);

% Extract score values
score_sig = simOut.yout.getElement(2);
score_sl = squeeze(score_sig.Values.Data);

Plot the anomaly scores and indicate the anomalous observations with red circle markers. Create a dashed line at the score threshold.

figure
plot(score_sl,".")
ylabel("Anomaly Score")
xlabel("Observation Number")
hold on
idx = find(IsAnomaly_sl == 1);
plot(idx,score_sl(idx),"ro")
yline(ifMdl.ScoreThreshold,"r--")
hold off

Figure contains an axes object. The axes object with xlabel Observation Number, ylabel Anomaly Score contains 3 objects of type line, constantline. One or more of the lines displays its values using only markers

Compute the fraction of anomalous observations in the test data set.

frac = sum(IsAnomaly_sl/n_test)
frac = 
0.0042

The anomaly fraction (0.4%) is similar to the anomaly fraction in the training data set.

Ports

Input

expand all

Input data, specified as a numeric matrix. The software assumes that the observations in the predictor data are oriented along the rows of x.

Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | Boolean | fixed point

Output

expand all

Anomaly indicators, returned as a logical vector. If an observation has an anomaly score greater than the threshold in the anomaly detection model (or the custom threshold, if specified), the corresponding anomaly indicator value is true.

Data Types: Boolean

Anomaly scores, returned as a vector of numeric scalars in the range [0,1]. A score value close to 0 indicates a normal observation, and a value close to 1 indicates an anomaly. For more information, see the Anomaly Scores section of the iforest page.

Dependencies

To enable this port, select the check box for Add output port for anomaly scores on the Main tab of the Block Parameters dialog box.

Data Types: single | double | half | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | fixed point

Parameters

expand all

Main

Specify the name of a workspace variable that contains an IsolationForest model object.

Programmatic Use

Block Parameter: AnomalyLearner
Type: workspace variable
Values: IsolationForest object
Default: "anomalyMdl"

Select the check box to include the second output port score in the Isolation Forest Anomaly Detector block.

Programmatic Use

Block Parameter: ShowOutputScore
Type: character vector or string
Values: "off" | "on"
Default: "off"

Specify the threshold for the anomaly score. If an observation has an anomaly score above the threshold, the software identifies the observation as an anomaly.

Programmatic Use

Block Parameter: ScoreThreshold
Type: character vector or string
Values: "[]" | numeric scalar
Default: "[]"

Specify the discrete interval between sample time hits or specify another type of sample time, such as continuous (0) or inherited (–1). For more options, see Types of Sample Time (Simulink).

By default, the Isolation Forest Anomaly Detector block inherits sample time based on the context of the block within the model.

Programmatic Use

Block Parameter: SystemSampleTime
Type: string scalar or character vector
Values: scalar
Default: "–1"

Data Types

Fixed-Point Operational Parameters

Specify the rounding mode for fixed-point operations. For more information, see Rounding Modes (Fixed-Point Designer).

Block parameters always round to the nearest representable value. To control the rounding of a block parameter, enter an expression into the mask field using a MATLAB® rounding function.

Programmatic Use

Block Parameter: RndMeth
Type: character vector
Values: "Ceiling" | "Convergent" | "Floor" | "Nearest" | "Round" | "Simplest" | "Zero"
Default: "Floor"

Specify whether overflows saturate or wrap.

ActionRationaleImpact on OverflowsExample

Select this check box (on).

Your model has possible overflow, and you want explicit saturation protection in the generated code.

Overflows saturate to either the minimum or maximum value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of –128.

Clear this check box (off).

You want to optimize the efficiency of your generated code.

You want to avoid overspecifying how a block handles out-of-range signals. For more information, see Troubleshoot Signal Range Errors (Simulink).

Overflows wrap to the appropriate value that the data type can represent.

The maximum value that the int8 (signed 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the value causing the overflow as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8 is –126.

Programmatic Use

Block Parameter: SaturateOnIntegerOverflow
Type: character vector
Values: "off" | "on"
Default: "off"

Select this parameter to prevent the fixed-point tools from overriding the data type you specify for the block. For more information, see Use Lock Output Data Type Setting (Fixed-Point Designer).

Programmatic Use

Block Parameter: LockScale
Type: character vector
Values: "off" | "on"
Default: "off"

Data Type

Specify the data type for the score output. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType.

When you select Inherit: auto, the block uses a rule that inherits a data type.

For more information about data types, see Control Data Types of Signals (Simulink).

Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see Specify Data Types Using Data Type Assistant (Simulink).

Programmatic Use

Block Parameter: ScoreDataTypeStr
Type: character vector or string
Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "<data type expression>"
Default: "Inherit: auto"

Specify the lower value of the score output range that Simulink® checks.

Simulink uses the minimum value to perform:

Note

The Score data type Minimum parameter does not saturate or clip the actual score output. To do so, use the Saturation (Simulink) block instead.

Programmatic Use

Block Parameter: ScoreOutMin
Type: character vector
Values: "[]" | scalar
Default: "[]"

Specify the upper value of the score output range that Simulink checks.

Simulink uses the maximum value to perform:

Note

The Score data type Maximum parameter does not saturate or clip the actual score output. To do so, use the Saturation (Simulink) block instead.

Programmatic Use

Block Parameter: ScoreOutMax
Type: character vector
Values: "[]" | scalar
Default: "[]"

Specify the data type of the adjusted tree depth. The type can be inherited, specified directly, or expressed as a data type object such as Simulink.NumericType.

For more information about data types, see Control Data Types of Signals (Simulink).

Click the Show data type assistant button to display the Data Type Assistant, which helps you set the data type attributes. For more information, see Specify Data Types Using Data Type Assistant (Simulink).

Programmatic Use

Block Parameter: TreeDepthDataTypeStr
Type: character vector or string
Values: "Inherit: auto" | "double" | "single" | "half" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "fixdt(1,16,0)" | "fixdt(1,16,2^0,0)" | "<data type expression>"
Default: "Inherit: auto"

Specify the lower value of the adjusted tree depth range that Simulink checks.

Simulink uses the minimum value to perform:

Note

The Adjusted tree depth data type Minimum parameter does not saturate or clip the actual adjusted tree depth value.

Programmatic Use

Block Parameter: TreeDepthOutMin
Type: character vector
Values: '[]' | scalar
Default: '[]'

Specify the upper value of the adjusted tree depth range that Simulink checks.

Simulink uses the maximum value to perform:

Note

The Adjusted tree depth data type Maximum parameter does not saturate or clip the actual adjusted tree depth value.

Programmatic Use

Block Parameter: TreeDepthOutMax
Type: character vector
Values: '[]' | scalar
Default: '[]'

Block Characteristics

Data Types

Boolean | double | fixed point | half | integer | single

Direct Feedthrough

yes

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

Fixed-Point Conversion
Design and simulate fixed-point systems using Fixed-Point Designer™.

Version History

Introduced in R2026a