If Deep Learning Toolbox™ does not provide the layer you require for your classification or regression problem, then you can define your own custom layer using this example as a guide. For a list of builtin layers, see List of Deep Learning Layers.
To define a custom deep learning layer, you can use the template provided in this example, which takes you through the following steps:
Name the layer – give the layer a name so that it can be used in MATLAB^{®}.
Declare the layer properties – specify the properties of the layer and which parameters are learned during training.
Create a constructor function (optional) – specify how to construct the layer
and initialize its properties. If you do not specify a constructor function,
then at creation, the software initializes the Name
,
Description
, and Type
properties with
[]
and sets the number of layer inputs and outputs to
1.
Create forward functions – specify how data passes forward through the layer (forward propagation) at prediction time and at training time.
Create a backward function – specify the derivatives of the loss with respect to the input data and the learnable parameters (backward propagation).
This example shows how to create a PReLU layer, which is a layer with a learnable parameter and use it in a convolutional neural network. A PReLU layer performs a threshold operation, where for each channel, any input value less than zero is multiplied by a scalar learned at training time.[1] For values less than zero, a PReLU layer applies scaling coefficients $${\alpha}_{i}$$ to each channel of the input. These coefficients form a learnable parameter, which the layer learns during training.
This figure from [1] compares the ReLU and PReLU layer functions.
Copy the layer with learnable parameters template into a new file in MATLAB. This template outlines the structure of a layer with learnable parameters and includes the functions that define the layer behavior.
classdef myLayer < nnet.layer.Layer properties % (Optional) Layer properties. % Layer properties go here. end properties (Learnable) % (Optional) Layer learnable parameters. % Layer learnable parameters go here. end methods function layer = myLayer() % (Optional) Create a myLayer. % This function must have the same name as the class. % Layer constructor function goes here. end function [Z1, …, Zm] = predict(layer, X1, …, Xn) % Forward input data through the layer at prediction time and % output the result. % % Inputs: % layer  Layer to forward propagate through % X1, ..., Xn  Input data % Outputs: % Z1, ..., Zm  Outputs of layer forward function % Layer forward function for prediction goes here. end function [Z1, …, Zm, memory] = forward(layer, X1, …, Xn) % (Optional) Forward input data through the layer at training % time and output the result and a memory value. % % Inputs: % layer  Layer to forward propagate through % X1, ..., Xn  Input data % Outputs: % Z1, ..., Zm  Outputs of layer forward function % memory  Memory value for backward propagation % Layer forward function for training goes here. end function [dLdX1, …, dLdXn, dLdW1, …, dLdWk] = ... backward(layer, X1, …, Xn, Z1, …, Zm, dLdZ1, …, dLdZm, memory) % Backward propagate the derivative of the loss function through % the layer. % % Inputs: % layer  Layer to backward propagate through % X1, ..., Xn  Input data % Z1, ..., Zm  Outputs of layer forward function % dLdZ1, ..., dLdZm  Gradients propagated from the next layers % memory  Memory value from forward function % Outputs: % dLdX1, ..., dLdXn  Derivatives of the loss with respect to the % inputs % dLdW1, ..., dLdWk  Derivatives of the loss with respect to each % learnable parameter % Layer backward function goes here. end end end
First, give the layer a name. In the first line of the class file, replace the
existing name myLayer
with preluLayer
.
classdef preluLayer < nnet.layer.Layer ... end
Next, rename the myLayer
constructor function (the first function
in the methods
section) so that it has the same name as the
layer.
methods function layer = preluLayer() ... end ... end
Save the layer class file in a new file named preluLayer.m
.
The file name must match the layer name. To use the layer, you must save the file in
the current folder or in a folder on the MATLAB path.
Declare the layer properties in the properties
section and declare
learnable parameters by listing them in the properties (Learnable)
section.
By default, custom intermediate layers have these properties:
Property  Description 

Name 
Layer name, specified as a character vector or a string scalar.
To include a layer in a layer graph, you must specify a nonempty unique layer name. If you train
a series network with the layer and Name is set to '' ,
then the software automatically assigns a name to the layer at training time.

Description  Oneline description of the layer, specified as a character
vector or a string scalar. This description appears when the layer
is displayed in a 
Type  Type of the layer, specified as a character vector or a string
scalar. The value of Type appears when the layer is
displayed in a Layer array. If you do not specify a
layer type, then the software displays the layer class name. 
NumInputs  Number of inputs of the layer specified as a positive integer. If you
do not specify this value, then the software automatically sets
NumInputs to the number of names in
InputNames . The default value is 1. 
InputNames  The input names of the layer specified as a cell array of character
vectors. If you do not specify this value and
NumInputs is greater than 1, then the software
automatically sets InputNames to
{'in1',...,'inN'} , where N is
equal to NumInputs . The default value is
{'in'} . 
NumOutputs  Number of outputs of the layer specified as a positive integer. If
you do not specify this value, then the software automatically sets
NumOutputs to the number of names in
OutputNames . The default value is 1. 
OutputNames  The output names of the layer specified as a cell array of character
vectors. If you do not specify this value and
NumOutputs is greater than 1, then the software
automatically sets OutputNames to
{'out1',...,'outM'} , where M
is equal to NumOutputs . The default value is
{'out'} . 
If the layer has no other properties, then you can omit the properties
section.
If you are creating a layer with multiple inputs, then you must set either the NumInputs
or InputNames
in the layer constructor. If you are creating a layer with multiple outputs, then you must set either the NumOutputs
or OutputNames
in the layer constructor. For an example, see Define Custom Deep Learning Layer with Multiple Inputs.
A PReLU layer does not require any additional properties, so you can remove the
properties
section.
A PReLU layer has only one learnable parameter, the scaling coefficient
a. Declare this learnable parameter in the properties
(Learnable)
section and call the parameter
Alpha
.
properties (Learnable)
% Layer learnable parameters
% Scaling coefficient
Alpha
end
Create the function that constructs the layer and initializes the layer properties. Specify any variables required to create the layer as inputs to the constructor function.
The PReLU layer constructor function requires two inputs arguments: the number of
channels of the expected input data and the layer name. The number of channels specifies
the size of the learnable parameter Alpha
. Specify two input
arguments named numChannels
and name
in the
preluLayer
function. Add a comment to the top of the function
that explains the syntax of the function.
function layer = preluLayer(numChannels, name) % layer = preluLayer(numChannels) creates a PReLU layer with % numChannels channels and specifies the layer name. ... end
Initialize the layer properties, including learnable parameters in the constructor
function. Replace the comment % Layer constructor function goes
here
with code that initializes the layer properties.
Set the Name
property to the input argument
name
.
% Set layer name.
layer.Name = name;
Give the layer a oneline description by setting the
Description
property of the layer. Set the description to
describe the type of layer and its size.
% Set layer description. layer.Description = "PReLU with " + numChannels + " channels";
For a PReLU layer, when the input values are negative, the layer multiplies each
channel of the input by the corresponding channel of Alpha
.
Initialize the learnable parameter Alpha
to be a random vector of
size 1by1bynumChannels
. With the third dimension specified as
size numChannels
, the layer can use elementwise multiplication
of the input in the forward function. Alpha
is a property of the
layer object, so you must assign the vector to
layer.Alpha
.
% Initialize scaling coefficient.
layer.Alpha = rand([1 1 numChannels]);
View the completed constructor function.
function layer = preluLayer(numChannels, name)
% layer = preluLayer(numChannels, name) creates a PReLU layer
% with numChannels channels and specifies the layer name.
% Set layer name.
layer.Name = name;
% Set layer description.
layer.Description = "PReLU with " + numChannels + " channels";
% Initialize scaling coefficient.
layer.Alpha = rand([1 1 numChannels]);
end
With this constructor function, the command
preluLayer(3,'prelu')
creates a PReLU layer with three
channels and the name 'prelu'
.
Create the layer forward functions to use at prediction time and training time.
Create a function named predict
that propagates the data forward
through the layer at prediction time and outputs the result.
The syntax for predict
is
[Z1,…,Zm] = predict(layer,X1,…,Xn)
X1,…,Xn
are the n
layer inputs and
Z1,…,Zm
are the m
layer outputs. The values
n
and m
must correspond to the
NumInputs
and NumOutputs
properties of the
layer.If the number of inputs to predict
can vary, then use
varargin
instead of X1,…,Xn
. In this case,
varargin
is a cell array of the inputs, where
varargin{i}
corresponds to Xi
. If the number
of outputs can vary, then use varargout
instead of
Z1,…,Zm
. In this case, varargout
is a cell
array of the outputs, where varargout{j}
corresponds to
Zj
.
Because a PReLU layer has only one input and one output, the syntax for
predict
for a PReLU layer is Z =
predict(layer,X)
.
By default, the layer uses predict
as the forward function at
training time. To use a different forward function at training time, or retain a value
required for the backward function, you must also create a function named
forward
.
The dimensions of the inputs depend on the type of data and the output of the connected layers:
Layer Input  Input Size  Observation Dimension 

2D images  hbywbycbyN, where h, w, and c correspond to the height, width, and number of channels of the images respectively, and N is the number of observations.  4 
3D images  hbywbyDbycbyN, where h, w, D, and c correspond to the height, width, depth, and number of channels of the 3D images respectively, and N is the number of observations.  5 
Vector sequences  cbyNbyS, where c is the number of features of the sequences, N is the number of observations, and S is the sequence length.  2 
2D image sequences  hbywbycbyNbyS, where h, w, and c correspond to the height, width, and number of channels of the images respectively, N is the number of observations, and S is the sequence length.  4 
3D image sequences  hbywbydbycbyNbyS, where h, w, d, and c correspond to the height, width, depth, and number of channels of the 3D images respectively, N is the number of observations, and S is the sequence length.  5 
The forward
function propagates the data forward through the layer
at training time and also outputs a memory value.
The syntax for forward
is
[Z1,…,Zm,memory] = forward(layer,X1,…,Xn)
X1,…,Xn
are the n
layer inputs, Z1,…,Zm
are the m
layer outputs, and memory
is the memory of the layer.If the number of inputs to forward
can vary, then use
varargin
instead of X1,…,Xn
. In this case,
varargin
is a cell array of the inputs, where
varargin{i}
corresponds to Xi
. If the number
of outputs can vary, then use varargout
instead of
Z1,…,Zm
. In this case, varargout
is a cell
array of the outputs, where varargout{j}
corresponds to
Zj
for j
=1,…,NumOutputs
and
varargout{NumOutputs+1}
corresponds to
memory
.
The PReLU operation is given by
$$f({x}_{i})=\{\begin{array}{cc}{x}_{i}& \text{if}{x}_{i}0\\ {\alpha}_{i}{x}_{i}& \text{if}{x}_{i}\le 0\end{array}$$
where $${x}_{i}$$ is the input of the nonlinear activation f on channel i, and $${\alpha}_{i}$$ is the coefficient controlling the slope of the negative part. The subscript i in $${\alpha}_{i}$$ indicates that the nonlinear activation can vary on different channels.
Implement this operation in predict
. In predict
,
the input X
corresponds to x in the equation. The
output Z
corresponds to $$f({x}_{i})$$. The PReLU layer does not require memory or a different forward
function for training, so you can remove the forward
function from
the class file. Add a comment to the top of the function that explains the syntaxes of
the function.
function Z = predict(layer, X)
% Z = predict(layer, X) forwards the input data X through the
% layer and outputs the result Z.
Z = max(0, X) + layer.Alpha .* min(0, X);
end
Implement the derivatives of the loss with respect to the input data and the learnable
parameters in the backward
function.
The syntax for backward
is
[dLdX1,…,dLdXn,dLdW1,…,dLdWk] = backward(layer,X1,…,Xn,Z1,…,Zm,dLdZ1,…,dLdZm,memory)
X1,…,Xn
are the n
layer inputs,
Z1,…,Zm
are the m
outputs of
forward
, dLdZ1,…,dLdZm
are the gradients backward
propagated from the next layer, and memory
is the memory output of
forward
. For the outputs, dLdX1,…,dLdXn
are the
derivatives of the loss with respect to the layer inputs and
dLdW1,…,dLdWk
are the derivatives of the loss with respect to the
k
learnable parameters. To reduce memory usage by preventing unused
variables being saved between the forward and backward pass, replace the corresponding input
arguments with ~
.If the number of inputs to backward
can vary, then use
varargin
instead of the input arguments after
layer
. In this case, varargin
is a cell array
of the inputs, where varargin{i}
corresponds to Xi
for i
=1,…,NumInputs
,
varargin{NumInputs+j}
and
varargin{NumInputs+NumOutputs+j}
correspond to
Zj
and dLdZj
, respectively, for
j
=1,…,NumOutputs
, and
varargin{end}
corresponds to memory
.
If the number of outputs can vary, then use varargout
instead of the
output arguments. In this case, varargout
is a cell array of the
outputs, where varargout{i}
corresponds to dLdXi
for i
=1,…,NumInputs
and
varargout{NumInputs+t}
corresponds to dLdWt
for t
=1,…,k
, where k
is the
number of learnable parameters.
Because a PReLU layer has only one input, one output, and one learnable parameter, the
syntax for backward
for a PReLU layer is [dLdX,dLdAlpha] =
backward(layer,X,Z,dLdZ,memory)
. The dimensions of X
and Z
are the same as in the forward functions. The dimensions of
dLdZ
are the same as the dimensions of Z
. The
dimensions and data type of dLdX
are the same as the dimensions and
data type of X
. The dimension and data type of
dLdAlpha
is the same as the dimension and data type of the
learnable parameter Alpha
.
During the backward pass, the layer automatically updates the learnable parameters using the corresponding derivatives.
To include a custom layer in a network, the layer forward
functions must accept the outputs of the previous layer and forward propagate arrays with the
size expected by the next layer. Similarly, backward
must accept inputs with
the same size as the corresponding output of the forward function and backward propagate
derivatives with the same size.
The derivative of the loss with respect to the input data is
$$\frac{\partial L}{\partial {x}_{i}}=\frac{\partial L}{\partial f({x}_{i})}\frac{\partial f({x}_{i})}{\partial {x}_{i}}$$
where $$\partial L/\partial f({x}_{i})$$ is the gradient propagated from the next layer, and the derivative of the activation is
$$\frac{\partial f({x}_{i})}{\partial {x}_{i}}=\{\begin{array}{cc}1& \text{if}{x}_{i}\ge 0\\ {\alpha}_{i}& {\text{ifx}}_{i}0\end{array}.$$
The derivative of the loss with respect to the learnable parameters is
$$\frac{\partial L}{\partial {\alpha}_{i}}={\displaystyle \sum _{j}^{}\frac{\partial L}{\partial f({x}_{ij})}\frac{\partial f({x}_{ij})}{\partial {\alpha}_{i}}}$$
where i indexes the channels, j indexes the elements over height, width, and observations, and $$\partial L/\partial f({x}_{i})$$ is the gradient propagated from the deeper layer, and the gradient of the activation is
$$\frac{\partial f({x}_{i})}{\partial {\alpha}_{i}}=\{\begin{array}{cc}0& \text{if}{x}_{i}\ge 0\\ {x}_{i}& \text{if}{x}_{i}0\end{array}.$$
In backward
of the layer template, replace the output
dLdW
with the output dLdAlpha
, where
dLdAlpha
corresponds to $$\partial L/\partial {\alpha}_{i}$$. In backward
, the input X
corresponds to x. The input Z
corresponds to $$f({x}_{i})$$. The input dLdZ
corresponds to $$\partial L/\partial f({x}_{i})$$. The output dLdX
corresponds to $$\partial L/\partial {x}_{i}$$.
Add a comment to the top of the function that explains the syntaxes of the function.
To reduce memory usage by preventing unused variables being saved between the forward
and backward pass, replace the corresponding input arguments with ~
.
Because the layer function does not require the input arguments Z
and
memory
, replace these arguments with ~
.
function [dLdX, dLdAlpha] = backward(layer, X, ~, dLdZ, ~)
% [dLdX, dLdAlpha] = backward(layer, X, ~, dLdZ, ~)
% backward propagates the derivative of the loss function
% through the layer.
% Inputs:
% layer  Layer to backward propagate through
% X  Input data
% dLdZ  Gradient propagated from the deeper layer
% Outputs:
% dLdX  Derivative of the loss with respect to the
% input data
% dLdAlpha  Derivative of the loss with respect to the
% learnable parameter Alpha
dLdX = layer.Alpha .* dLdZ;
dLdX(X>0) = dLdZ(X>0);
dLdAlpha = min(0,X) .* dLdZ;
dLdAlpha = sum(sum(dLdAlpha,1),2);
% Sum over all observations in minibatch.
dLdAlpha = sum(dLdAlpha,4);
end
View the completed layer class file.
classdef preluLayer < nnet.layer.Layer % Example custom PReLU layer. properties (Learnable) % Layer learnable parameters % Scaling coefficient Alpha end methods function layer = preluLayer(numChannels, name) % layer = preluLayer(numChannels, name) creates a PReLU layer % with numChannels channels and specifies the layer name. % Set layer name. layer.Name = name; % Set layer description. layer.Description = "PReLU with " + numChannels + " channels"; % Initialize scaling coefficient. layer.Alpha = rand([1 1 numChannels]); end function Z = predict(layer, X) % Z = predict(layer, X) forwards the input data X through the % layer and outputs the result Z. Z = max(0, X) + layer.Alpha .* min(0, X); end function [dLdX, dLdAlpha] = backward(layer, X, ~, dLdZ, ~) % [dLdX, dLdAlpha] = backward(layer, X, ~, dLdZ, ~) % backward propagates the derivative of the loss function % through the layer. % Inputs: % layer  Layer to backward propagate through % X  Input data % dLdZ  Gradient propagated from the deeper layer % Outputs: % dLdX  Derivative of the loss with respect to the % input data % dLdAlpha  Derivative of the loss with respect to the % learnable parameter Alpha dLdX = layer.Alpha .* dLdZ; dLdX(X>0) = dLdZ(X>0); dLdAlpha = min(0,X) .* dLdZ; dLdAlpha = sum(sum(dLdAlpha,1),2); % Sum over all observations in minibatch. dLdAlpha = sum(dLdAlpha,4); end end end
For GPU compatibility, the layer functions must support inputs
and return outputs of type gpuArray
. Any other functions the layer uses
must do the same. Many MATLAB builtin functions support gpuArray
input arguments. If you call any of these functions with at least one
gpuArray
input, then the function executes on the GPU and returns a
gpuArray
output. For a list of functions that execute on a GPU, see
Run MATLAB Functions on a GPU (Parallel Computing Toolbox). To use a GPU for deep
learning, you must also have a CUDA^{®} enabled NVIDIA^{®} GPU with compute capability 3.0 or higher. For more information on working with GPUs in MATLAB, see GPU Computing in MATLAB (Parallel Computing Toolbox).
The MATLAB functions used in predict
, forward
,
and backward
all support gpuArray
inputs, so the
layer is GPU compatible.
checkLayer
Check the layer validity of the custom layer preluLayer
.
Define a custom PReLU layer. To create this layer, save the file preluLayer.m
in the current folder.
Create an instance of the layer and check its validity using checkLayer
. Specify the valid input size to be the size of a single observation of typical input to the layer. The layer expects 4D array inputs, where the first three dimensions correspond to the height, width, and number of channels of the previous layer output, and the fourth dimension corresponds to the observations.
Specify the typical size of the input of an observation and set 'ObservationDimension'
to 4.
layer = preluLayer(20,'prelu'); validInputSize = [24 24 20]; checkLayer(layer,validInputSize,'ObservationDimension',4)
Skipping GPU tests. No compatible GPU device found. Running nnet.checklayer.TestCase .......... ........ Done nnet.checklayer.TestCase __________ Test Summary: 18 Passed, 0 Failed, 0 Incomplete, 6 Skipped. Time elapsed: 106.6761 seconds.
Here, the function does not detect any issues with the layer.
You can use a custom layer in the same way as any other layer in Deep Learning Toolbox. This section shows how to create and train a network for digit classification using the PReLU layer you created earlier.
Load the example training data.
[XTrain,YTrain] = digitTrain4DArrayData;
Define a custom PReLU layer. To create this layer, save the file preluLayer.m
in the current folder. Create a layer array including the custom layer preluLayer
.
layers = [
imageInputLayer([28 28 1])
convolution2dLayer(5,20)
batchNormalizationLayer
preluLayer(20,'prelu')
fullyConnectedLayer(10)
softmaxLayer
classificationLayer];
Set the training options and train the network.
options = trainingOptions('adam','MaxEpochs',10); net = trainNetwork(XTrain,YTrain,layers,options);
Training on single CPU. Initializing input data normalization. ========================================================================================  Epoch  Iteration  Time Elapsed  Minibatch  Minibatch  Base Learning     (hh:mm:ss)  Accuracy  Loss  Rate  ========================================================================================  1  1  00:00:00  7.03%  3.3828  0.0010   2  50  00:00:11  74.22%  0.7206  0.0010   3  100  00:00:21  89.84%  0.3583  0.0010   4  150  00:00:32  88.28%  0.4037  0.0010   6  200  00:00:45  96.88%  0.2034  0.0010   7  250  00:00:57  96.88%  0.1370  0.0010   8  300  00:01:09  100.00%  0.0609  0.0010   9  350  00:01:22  100.00%  0.0534  0.0010   10  390  00:01:30  99.22%  0.0527  0.0010  ========================================================================================
Evaluate the network performance by predicting on new data and calculating the accuracy.
[XTest,YTest] = digitTest4DArrayData; YPred = classify(net,XTest); accuracy = sum(YTest==YPred)/numel(YTest)
accuracy = 0.9194
[1] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving deep into rectifiers: Surpassing humanlevel performance on ImageNet classification." In Proceedings of the IEEE international conference on computer vision, pp. 10261034. 2015.