Code Generation for Quantized Deep Learning Networks

Deep learning uses neural network architectures that contain many processing layers, including convolutional layers. Deep learning models typically work on large sets of labeled data. Performing inference on these models is computationally intensive, consuming significant amount of memory. Neural networks use memory to store input data, parameters (weights), and activations from each layer as the input propagates through the network. Deep Neural networks trained in MATLAB^®use single-precision floating point data types. Even networks that are small in size require a considerable amount of memory and hardware to perform these floating-point arithmetic operations. These restrictions can inhibit deployment of deep learning models to devices that have low computational power and smaller memory resources. By using a lower precision to store the weights and activations, you can reduce the memory requirements of the network.

You can use Deep Learning Toolbox™ in tandem with the Deep Learning Toolbox Model Quantization Library support package to reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. Then, you can useMATLAB Coder™to generate optimized code for the quantized network. The generated code takes advantage of ARM^®使用ARM COMPUTE库的处理器SIMD。生成的代码可以作为源代码，静态或动态库或可执行文件集成到您的项目中，您可以将其部署到Raspberry Pi™等各种ARM CPU平台。

Supported Layers and Classes

You can generate C++ code for these convolution layers that uses the ARM Compute Library and performs inference computations in 8-bit integers:

2-D convolution layer (convolution2dLayer(Deep Learning Toolbox))
2-D grouped convolution layer (groupedConvolution2dLayer(Deep Learning Toolbox)). The value of theNumGroupsinput argument must be equal to2.

C++ code generation for quantized deep learning networks supportsDAGNetwork(Deep Learning Toolbox)和SeriesNetwork(Deep Learning Toolbox)objects.

生成代码

要生成在8位整数中执行推理计算的代码，coder.ARMNEONConfigobjectdlcfg, set these additional properties:

dlcfg.CalibrationResultFile =“ dlquantizerObjectMatfile”; dlcfg.DataType ='int8';

Alternatively, in theMATLAB Coder应用程序，在Deep Learningtab, setTarget librarytoC臂ompute. Then set theData type和校准结果文件路径parameters.

Here“ dlquantizerObjectMatfile”is the name of the MAT-file thatdlquantizer(Deep Learning Toolbox)generates for specific calibration data. For the purpose of calibration, set theExecutionEnvironment属性dlquantizerobject to'CPU'.

Otherwise, follow the steps described in使用ARM计算库的深度学习网络代码生成.

For an example, seeCode Generation for Quantized Deep Learning Network on Raspberry Pi.