geluLayer

Gaussian error linear unit (GELU) layer

Description

A Gaussian error linear unit (GELU) layer weights the input by its probability under a Gaussian distribution.

This operation is given by

`$\text{GELU}\left(x\right)=\frac{x}{2}\left(1+\text{​}\text{erf}\left(\frac{x}{\sqrt{2}}\right)\right),$`

where erf denotes the error function.

Creation

Syntax

``layer = geluLayer``
``layer = geluLayer(Name=Value)``

Description

example

````layer = geluLayer` returns a GELU layer.```
````layer = geluLayer(Name=Value)` sets the optional `Approximation` and `Name` properties using name-value arguments. For example, `geluLayer(Name="gelu")` creates a GELU layer with the name `"gelu"`.```

Properties

expand all

GELU

Approximation method for the GELU operation, specified as one of these values:

• `'none'` — Do not use approximation.

• `'tanh'` — Approximate the underlying error function using

`$\text{erf}\left(\frac{x}{\sqrt{2}}\right)\approx \text{tanh}\left(\sqrt{\frac{2}{\pi }}\left(x+0.044715{x}^{3}\right)\right).$`

Tip

In MATLAB®, computing the tanh approximation is typically less accurate, and, for large input sizes, slower than computing the GELU activation without using an approximation. Use the tanh approximation when you want to reproduce models that use this approximation, such as BERT and GPT-2.

Layer

Layer name, specified as a character vector or a string scalar. For `Layer` array input, the `trainNetwork`, `assembleNetwork`, `layerGraph`, and `dlnetwork` functions automatically assign names to layers with the name `''`.

Data Types: `char` | `string`

Number of inputs of the layer. This layer accepts a single input only.

Data Types: `double`

Input names of the layer. This layer accepts a single input only.

Data Types: `cell`

Number of outputs of the layer. This layer has a single output only.

Data Types: `double`

Output names of the layer. This layer has a single output only.

Data Types: `cell`

Examples

collapse all

Create a GELU layer.

`layer = geluLayer`
```layer = GELULayer with properties: Name: '' Hyperparameters Approximation: 'none' ```

Include a GELU layer in a `Layer` array.

```layers = [ imageInputLayer([28 28 1]) convolution2dLayer(5,20) geluLayer maxPooling2dLayer(2,Stride=2) fullyConnectedLayer(10) softmaxLayer classificationLayer]```
```layers = 7×1 Layer array with layers: 1 '' Image Input 28×28×1 images with 'zerocenter' normalization 2 '' Convolution 20 5×5 convolutions with stride [1 1] and padding [0 0 0 0] 3 '' GELU GELU 4 '' Max Pooling 2×2 max pooling with stride [2 2] and padding [0 0 0 0] 5 '' Fully Connected 10 fully connected layer 6 '' Softmax softmax 7 '' Classification Output crossentropyex ```

expand all

References

[1] Hendrycks, Dan, and Kevin Gimpel. "Gaussian error linear units (GELUs)." Preprint, submitted June 27, 2016. https://arxiv.org/abs/1606.08415

Version History

Introduced in R2022b