Main Content

Deep Network Quantizer

Quantize a deep neural network to 8-bit scaled integer data types

Description

Use the Deep Network Quantizer app to reduce the memory requirement of a deep neural network by quantizing weights, biases, and activations of convolution layers to 8-bit scaled integer data types. Using this app you can:

  • Visualize the dynamic ranges of convolution layers in a deep neural network.

  • Select individual network layers to quantize.

  • Asses the performance of a quantized network.

  • Generate GPU code to deploy the quantized network using GPU Coder™.

  • Generate HDL code to deploy the quantized network to an FPGA using Deep Learning HDL Toolbox™.

The Deep Learning Toolbox™ Model Quantization Library support package is a free add-on that you can download using the Add-On Explorer. Alternatively, see Deep Learning Toolbox Model Quantization Library. To learn about the products required to quantize and deploy the deep learning network to an FPGA or GPU environment, see Quantization Workflow Prerequisites.

Deep Network Quantizer app

Open the Deep Network Quantizer App

  • MATLAB® command prompt: Enter deepNetworkQuantizer.

Examples

expand all

To explore the behavior of a neural network with quantized convolution layers, use the Deep Network Quantizer app. This example quantizes the learnable parameters of the convolution layers of the squeezenet neural network after retraining the network to classify new images according to the Train Deep Learning Network to Classify New Images example.

This example uses a DAG network with the GPU execution environment. To use the FPGA execution environment, load a series network. DAG networks are not supported for an FPGA execution environment.

Load the network to quantize into the base workspace.

net
net = 

  DAGNetwork with properties:

         Layers: [68x1 nnet.cnn.layer.Layer]
    Connections: [75x2 table]
     InputNames: {'data'}
    OutputNames: {'new_classoutput'}

Define calibration and validation data.

The app uses calibration data to exercise the network and collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.

The app uses the validation data to test the network after quantization to understand the effects of the limited range and precision of the quantized learnable parameters of the convolution layers in the network.

In this example, use the images in the MerchData data set. Define an augmentedImageDatastore object to resize the data for the network. Then, split the data into calibration and validation data sets.

unzip('MerchData.zip');
imds = imageDatastore('MerchData', ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');
[calData, valData] = splitEachLabel(imds, 0.7, 'randomized');
aug_calData = augmentedImageDatastore([227 227], calData);
aug_valData = augmentedImageDatastore([227 227], valData);

At the MATLAB command prompt, open the app.

deepNetworkQuantizer

In the app, click New and select Quantize a network.

The app verifies your execution environment. For more information, see Quantization Workflow Prerequisites.

In the dialog, select the execution environment and the network to quantize from the base workspace. For this example, select a GPU execution environment and the DAG network, net.

Select a network and execution environment

The app displays the layer graph of the selected network.

In the Calibrate section of the toolstrip, under Calibration Data, select the augmentedImageDatastore object from the base workspace containing the calibration data, calData.

Click Calibrate.

The Deep Network Quantizer uses the calibration data to exercise the network and collect range information for the learnable parameters in the network layers.

When the calibration is complete, the app displays a table containing the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network and their minimum and maximum values during the calibration. To the right of the table, the app displays histograms of the dynamic ranges of the parameters. The gray regions of the histograms indicate data that cannot be represented by the quantized representation. For more information on how to interpret these histograms, see Quantization of Deep Neural Networks.

Deep Network Quantizer calibration

In the Quantize column of the table, indicate whether to quantize the learnable parameters in the layer. Layers that are not convolution layers cannot be quantized, and therefore cannot be selected. Layers that are not quantized remain in single-precision after quantization.

In the Validate section of the toolstrip, under Validation Data, select the augmentedImageDatastore object from the base workspace containing the validation data, aug_valData.

In the Validate section of the toolstrip, under Quantization Options, select the Default metric function.

Click Quantize and Validate.

The Deep Network Quantizer quantizes the weights, activations, and biases of convolution layers in the network to scaled 8-bit integer data types and uses the validation data to exercise the network. The app determines a metric function to use for the validation based on the type of network that is being quantized.

Type of NetworkMetric Function
Classification

Top-1 Accuracy – Accuracy of the network

Object Detection

Average Precision – Average precision over all detection results

Regression

MSE – Mean squared error of the network

Single Shot Detector (SSD)

WeightedIOU – Average IoU of each class, weighted by the number of pixels in that class

When the validation is complete, the app displays the results of the validation, including:

  • Metric function used for validation

  • Result of the metric function before and after quantization

  • Memory requirement of the network before and after quantization (MB)

Deep Network Quantizer validation

If you want to use a different metric function for validation, for example to use the Top-5 accuracy metric function instead of the default Top-1 accuracy metric function, you can define a custom metric function. Save this function in a local file.

function accuracy = hComputeModelAccuracy(predictionScores, net, dataStore)
%% Computes model-level accuracy statistics
    
    % Load ground truth
    tmp = readall(dataStore);
    groundTruth = tmp.response;
    
    % Compare with predicted label with actual ground truth 
    predictionError = {};
    for idx=1:numel(groundTruth)
        [~, idy] = max(predictionScores(idx,:)); 
        yActual = net.Layers(end).Classes(idy);
        predictionError{end+1} = (yActual == groundTruth(idx)); %#ok
    end
    
    % Sum all prediction errors.
    predictionError = [predictionError{:}];
    accuracy = sum(predictionError)/numel(predictionError);
end

To revalidate the network using this custom metric function, under Quantization Options, enter the name of the custom metric function, hComputeModelAccuracy. Select Add to add hComputeModelAccuracy to the list of metric functions available in the app. Select hComputeModelAccuracy as the metric function to use.

The custom metric function must be on the path. If the metric function is not on the path, this step will produce an error.

Deep Network Quantizer select custom metric function

Click Quantize and Validate.

The app quantizes the network and displays the validation results for the custom metric function.

Deep Network Quantizer validation

The app displays only scalar values in the validation results table. To view the validation results for a custom metric function with non-scalar output, export the dlquantizer object as described below, then validate using the validate function at the MATLAB command window.

After quantizing and validating the network, you can choose to export the quantized network.

Click the Export button. In the drop down, select Export Quantizer to create a dlquantizer object in the base workspace. To open the GPU Coder app and generate GPU code from the quantized neural network, select Generate Code. Generating GPU code requires a GPU Coder license.

If the performance of the quantized network is not satisfactory, you can choose to not quantize some layers by deselecting the layer in the table. To see the effects, click Quantize and Validate again.

This example shows you how to import a dlquantizer object from the base workspace into the Deep Network Quantizer app. This allows you to begin quantization of a deep neural network using the command line or the app, and resume your work later in the app.

Load the network to quantize into the base workspace.

net
net = 

  DAGNetwork with properties:

         Layers: [68x1 nnet.cnn.layer.Layer]
    Connections: [75x2 table]
     InputNames: {'data'}
    OutputNames: {'new_classoutput'}

Define calibration and validation data to use for quantization.

The calibration data is used to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.

The validation data is used to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.

In this example, use the images in the MerchData data set. Define an augmentedImageDatastore object to resize the data for the network. Then, split the data into calibration and validation data sets.

unzip('MerchData.zip');
imds = imageDatastore('MerchData', ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');
[calData, valData] = splitEachLabel(imds, 0.7, 'randomized');
aug_calData = augmentedImageDatastore([227 227], calData);
aug_valData = augmentedImageDatastore([227 227], valData);

Create a dlquantizer object and specify the network to quantize.

quantObj = dlquantizer(net);

Use the calibrate function to exercise the network with sample inputs and collect range information. The calibrate function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network.

calResults = calibrate(quantObj, aug_calData)
calResults =

  95x5 table

                   Optimized Layer Name                      Network Layer Name        Learnables / Activations     MinValue      MaxValue 
    __________________________________________________    _________________________    ________________________    __________    __________

    {'conv1_relu_conv1_Weights'                      }    {'relu_conv1'           }         "Weights"                -0.91985       0.88489
    {'conv1_relu_conv1_Bias'                         }    {'relu_conv1'           }         "Bias"                   -0.07925       0.26343
    {'fire2-squeeze1x1_fire2-relu_squeeze1x1_Weights'}    {'fire2-relu_squeeze1x1'}         "Weights"                   -1.38        1.2477
    {'fire2-squeeze1x1_fire2-relu_squeeze1x1_Bias'   }    {'fire2-relu_squeeze1x1'}         "Bias"                   -0.11641       0.24273
    {'fire2-expand1x1_fire2-relu_expand1x1_Weights'  }    {'fire2-relu_expand1x1' }         "Weights"                 -0.7406       0.90982
    {'fire2-expand1x1_fire2-relu_expand1x1_Bias'     }    {'fire2-relu_expand1x1' }         "Bias"                  -0.060056       0.14602
    {'fire2-expand3x3_fire2-relu_expand3x3_Weights'  }    {'fire2-relu_expand3x3' }         "Weights"                -0.74397       0.66905
    {'fire2-expand3x3_fire2-relu_expand3x3_Bias'     }    {'fire2-relu_expand3x3' }         "Bias"                  -0.051778      0.074239
    {'fire3-squeeze1x1_fire3-relu_squeeze1x1_Weights'}    {'fire3-relu_squeeze1x1'}         "Weights"                -0.77262       0.68583
    {'fire3-squeeze1x1_fire3-relu_squeeze1x1_Bias'   }    {'fire3-relu_squeeze1x1'}         "Bias"                   -0.10145       0.32669
    {'fire3-expand1x1_fire3-relu_expand1x1_Weights'  }    {'fire3-relu_expand1x1' }         "Weights"                -0.72083       0.97157
    {'fire3-expand1x1_fire3-relu_expand1x1_Bias'     }    {'fire3-relu_expand1x1' }         "Bias"                  -0.067019       0.30422
    {'fire3-expand3x3_fire3-relu_expand3x3_Weights'  }    {'fire3-relu_expand3x3' }         "Weights"                -0.61403       0.77544
    {'fire3-expand3x3_fire3-relu_expand3x3_Bias'     }    {'fire3-relu_expand3x3' }         "Bias"                  -0.053621        0.1033
    {'fire4-squeeze1x1_fire4-relu_squeeze1x1_Weights'}    {'fire4-relu_squeeze1x1'}         "Weights"                -0.74164        1.0865
    {'fire4-squeeze1x1_fire4-relu_squeeze1x1_Bias'   }    {'fire4-relu_squeeze1x1'}         "Bias"                   -0.10885       0.13875
...

Open the Deep Network Quantizer app.

deepNetworkQuantizer

In the app, click New and select Import dlquantizer object.

Deep Network Quantizer import dlquantizer object

In the dialog, select the dlquantizer object to import from the base workspace.

Select a dlquantizer object to import

The app imports any data contained in the dlquantizer object that was collected at the command line. This data can include the network to quantize, calibration data, validation data, and calibration statistics.

The app displays a table containing the calibration data contained in the imported dlquantizer object, quantObj. To the right of the table, the app displays histograms of the dynamic ranges of the parameters. The gray regions of the histograms indicate data that cannot be represented by the quantized representation. For more information on how to interpret these histograms, see Quantization of Deep Neural Networks.

Deep Network Quantizer app

Related Examples

Introduced in R2020a