# Kernel Principal Component Analysis (KPCA)

MATLAB code for dimensionality reduction, fault detection, and fault diagnosis using KPCA.
## Main features

MATLAB code for dimensionality reduction, fault detection, and fault diagnosis using KPCA

Version 2.2, 14-MAY-2021

## Main features

• Easy-used API for training and testing KPCA model
• Support for dimensionality reduction, data reconstruction, fault detection, and fault diagnosis
• Multiple kinds of kernel functions (linear, gaussian, polynomial, sigmoid, laplacian)
• Visualization of training and test results
• Component number determination based on given explained level or given number

## Notices

• Only fault diagnosis of Gaussian kernel is supported.
• This code is for reference only.

## How to use

### 01. Kernel funcions

A class named Kernel is defined to compute kernel function matrix.

```%{
type   -

linear      :  k(x,y) = x'*y
polynomial  :  k(x,y) = (γ*x'*y+c)^d
gaussian    :  k(x,y) = exp(-γ*||x-y||^2)
sigmoid     :  k(x,y) = tanh(γ*x'*y+c)
laplacian   :  k(x,y) = exp(-γ*||x-y||)

degree -  d
offset -  c
gamma  -  γ
%}
kernel = Kernel('type', 'gaussian', 'gamma', value);
kernel = Kernel('type', 'polynomial', 'degree', value);
kernel = Kernel('type', 'linear');
kernel = Kernel('type', 'sigmoid', 'gamma', value);
kernel = Kernel('type', 'laplacian', 'gamma', value);```

For example, compute the kernel matrix between X and Y

```X = rand(5, 2);
Y = rand(3, 2);
kernel = Kernel('type', 'gaussian', 'gamma', 2);
kernelMatrix = kernel.computeMatrix(X, Y);
>> kernelMatrix

kernelMatrix =

0.5684    0.5607    0.4007
0.4651    0.8383    0.5091
0.8392    0.7116    0.9834
0.4731    0.8816    0.8052
0.5034    0.9807    0.7274```

### 02. Simple KPCA model for dimensionality reduction

```clc
clear all
close all

kernel = Kernel('type', 'gaussian', 'gamma', 2);
parameter = struct('numComponents', 2, ...
'kernelFunc', kernel);
% build a KPCA object
kpca = KernelPCA(parameter);
% train KPCA model
kpca.train(data);

%　mapping data
mappingData = kpca.score;

% Visualization
kplot = KernelPCAVisualization();
% visulize the mapping data
kplot.score(kpca)```

The training results (dimensionality reduction):

```*** KPCA model training finished ***
running time            = 0.2798 seconds
kernel function         = gaussian
number of samples       = 1000
number of features      = 3
number of components    = 2
number of T2 alarm      = 135
number of SPE alarm     = 0
accuracy of T2          = 86.5000%
accuracy of SPE         = 100.0000% ```

Another application using banana-shaped data:

### 03. Simple KPCA model for reconstruction

```clc
clear all
close all

kernel = Kernel('type', 'gaussian', 'gamma', 0.2);
parameter = struct('numComponents', 2, ...
'kernelFunc', kernel);
% build a KPCA object
kpca = KernelPCA(parameter);
% train KPCA model
kpca.train(data);

%　reconstructed data
reconstructedData = kpca.newData;

% Visualization
kplot = KernelPCAVisualization();
kplot.reconstruction(kpca)```

### 04. Component number determination

The Component number can be determined based on given explained level or given number.

Case 1

The number of components is determined by the given explained level. The given explained level should be 0 < explained level < 1. For example, when explained level is set to 0.75, the parameter should be set as:

```parameter = struct('numComponents', 0.75, ...
'kernelFunc', kernel);```

The code is

```clc
clear all
close all

kernel = Kernel('type', 'gaussian', 'gamma', 1/128^2);

parameter = struct('numComponents', 0.75, ...
'kernelFunc', kernel);
% build a KPCA object
kpca = KernelPCA(parameter);
% train KPCA model
kpca.train(trainData);

% Visualization
kplot = KernelPCAVisualization();
kplot.cumContribution(kpca)```

As shown in the image, when the number of components is 21, the cumulative contribution rate is 75.2656%，which exceeds the given explained level (0.75）.

Case 2

The number of components is determined by the given number. For example, when the given number is set to 24, the parameter should be set as:

```parameter = struct('numComponents', 24, ...
'kernelFunc', kernel);```

The code is

```clc
clear all
close all

kernel = Kernel('type', 'gaussian', 'gamma', 1/128^2);

parameter = struct('numComponents', 24, ...
'kernelFunc', kernel);
% build a KPCA object
kpca = KernelPCA(parameter);
% train KPCA model
kpca.train(trainData);

% Visualization
kplot = KernelPCAVisualization();
kplot.cumContribution(kpca)```

As shown in the image, when the number of components is 24, the cumulative contribution rate is 80.2539%.

### 05. Fault detection

Demonstration of fault detection using KPCA (TE process data)

```clc
clear all
close all

kernel = Kernel('type', 'gaussian', 'gamma', 1/128^2);
parameter = struct('numComponents', 0.65, ...
'kernelFunc', kernel);

% build a KPCA object
kpca = KernelPCA(parameter);
% train KPCA model
kpca.train(trainData);
% test KPCA model
results = kpca.test(testData);

% Visualization
kplot = KernelPCAVisualization();
kplot.cumContribution(kpca)
kplot.trainResults(kpca)
kplot.testResults(kpca, results)```

The training results are

```*** KPCA model training finished ***
running time            = 0.0986 seconds
kernel function         = gaussian
number of samples       = 500
number of features      = 52
number of components    = 16
number of T2 alarm      = 16
number of SPE alarm     = 17
accuracy of T2          = 96.8000%
accuracy of SPE         = 96.6000% ```

The test results are

```*** KPCA model test finished ***
running time            = 0.0312 seconds
number of test data     = 960
number of T2 alarm      = 799
number of SPE alarm     = 851 ```

### 06. Fault diagnosis

Notice

• If you want to calculate CPS of a certain time, you should set starting time equal to ending time. For example, 'diagnosis', [500, 500]
• If you want to calculate the average CPS of a period of time, starting time and ending time should be set respectively. 'diagnosis', [300, 500]
• The fault diagnosis module is only supported for gaussian kernel function and it may still take a long time when the number of the training data is large.
```clc
clear all
close all

kernel = Kernel('type', 'gaussian', 'gamma', 1/128^2);

parameter = struct('numComponents', 0.65, ...
'kernelFunc', kernel,...
'diagnosis', [300, 500]);

% build a KPCA object
kpca = KernelPCA(parameter);
% train KPCA model
kpca.train(trainData);
% test KPCA model
results = kpca.test(testData);

% Visualization
kplot = KernelPCAVisualization();
kplot.cumContribution(kpca)
kplot.trainResults(kpca)
kplot.testResults(kpca, results)
kplot.diagnosis(results)```

Diagnosis results:

```*** Fault diagnosis ***
Fault diagnosis start...
Fault diagnosis finished.
running time            = 18.2738 seconds
start point             = 300
ending point            = 500
fault variables (T2)    = 44   1   4
fault variables (SPE)   = 1  44  18 ```

