Deep Network Quantization and Deployment Using Deep Learning Toolbox Model Quantization Library

了解如何在MATLAB中量化，校准和验证深度神经网络^®使用白盒方法使得性能和准确性之间的权衡，然后将量化的DNN部署到嵌入式GPU和FPGA硬件板。

Using the Deep Learning Toolbox™ Model Quantization Library, you can quantize deep neural networks such as Squeezenet. During calibration, the tool collects required ranges for weights, biases, and activations, then provides visualization that represents histogram distributions of the calibrated dynamic ranges in power of two scale. You can then deploy the quantized network using GPU Coder™ to an NVIDIA^®Jetson^®AGX Xavier实现了2倍的性能和4倍的内存用量减少了2倍的加速，并且只有约3％的前1个精度损耗，而单精度实现相比。

See how to use the tool to quantize and deploy networks to a Xilinx^®ZCU102 board connected to a high-speed camera. The original deep neural network had throughput of 45 frames per second. Using the Deep Learning Toolbox Model Quantization Library, you can quantize the networks to INT8, boosting the throughput to 139 frames per second while maintaining the right prediction results.

在此演示中，我们将显示工作流程来量化深度学习网络并从Matlab部署到GPU和FPGA。

将深度学习网络部署到Edge设备具有挑战性，因为深度学习网络可能相当计算密集型。例如，像alexNet这样的简单网络超过200 MB，而vgg-16这样的更大的网络是500 MB的。

Quantization helps to reduce the size of the network by converting floating point values used in the networks to smaller bit-widths while keeping the precision loss to a minimum.

从R2020A开始，我们释放了使用白盒来量化深度学习算法的能力，易于使用的迭代工作流程。这种方法有助于您在性能和准确性之间进行权衡。

看到这个工作流ction, let’s take an example of detecting defects in nuts and bolts that you might find in manufacturing.

让我们说这是检查生产线的一部分，因此我们需要在120帧/秒钟内使用高速相机处理。

Requirements from system engineering will involve metrics like accuracy, latency of the network, and overall hardware cost, …

and they often drive tradeoff of choices during the design and implementation of the network.

此申请包括......

1) Preprocessing logic that resizes and selects a region of interest, ...

2）使用预磨料网络检测部分有缺陷的位置，...

3) And finally postprocessing to annotate the result on the screen.

Let’s get started with quantization workflow by looking at deployment to embedded GPUs.

在NVIDIA Jetson Agx Xavier上运行的GPU量化和部署到性能和4x内存减少的2x加速，并且只有约3％的前1个精度损耗与单精度实现相比。

This example uses Squeezenet that consumes 5 MB of disk memory.

要启动，我们首先从附加资源管理器下载深度学习量化支持包，然后启动应用程序。金宝app

一旦我们加载网络以量化GPU目标，我们将使用已经设置的数据存储校准。校准通过网络运行一组图像，以收集权重，偏差和激活的所需范围。

可视化表示校准动态范围的直方图分布，其功率为两种比例。直方图中的灰色显示了不能由量化类型表示的数据，而蓝色则显示可以由量化类型表示的内容。最后，较暗的颜色是更高的频率箱。

如果这是可接受的，我们量化网络并加载数据存储验证量化网络的准确性。

这是结果。内存已减少74％，而在桌面GPU上测量时，与原始浮点网络相比，上1个精度没有损失。

一旦我们验证了结果并导出了DlQuantizer工作流对象，我们就可以使用GPU编码器将量化网络部署到NVIDIA Jetson板上。

We run inference for defective.png, we expect this image to get classified as defective bolt.

Now let’s turn our attention to quantizing and deploying networks to a Xilinx ZCU102 board. The network uses 34 MB of memory for learnable parameters and a runtime memory of 200 MB.

With these 5 lines of MATLAB code, we can load the single precision bitstream running on the ZCU102 board. We see that it uses 84 MB of memory with a throughput of 45 frames per second. This is not fast enough for our high-speed camera.

Let’s choose to quantize for FPGA.

Once the quantization workflow is completed, we’ll export the quantized network to the MATLAB workspace.

量化网络需要run on a processor quantized to INT8, so we’ll use the INT8 version of our downloaded zcu102 bitstream.

After compiling, the parameters have been reduced to 68 MB and we can run the network at 139 frames per second. We are getting the right prediction results as well.

因此，尽可能看到，深度学习量化应用程序可帮助您减少GPU和FPGA的深度学习网络的大小，同时最大限度地减少精度损失。如果您有兴趣了解更多信息，请查看R2020A中的深度学习工具箱模型量化库或最新的R2020B。