Deep Learning Toolbox Model Quantization Library

by MathWorks Fixed Point Team

Quantize and Compress Deep Learning models

(10)

410 Downloads

Updated09 Mar 2022

Deep Learning Toolbox Model Quantization Library enables quantization and compression of your deep learning models to reduce the memory footprint and computational requirements of your deep neural network.

Quantization to INT8 is supported for CPUs, FPGAs, and NVIDIA GPUs, for supported layers. The library enables you to collect layer level data on the weights, activations, and intermediate computations. Using this data, the library quantizes your model and provides metrics to validate the accuracy of the quantized network against the single precision baseline. The iterative workflow allows you to optimize the quantization strategy.

The library also supports pruning which reduces network size by removing network elements that have the smallest impact on inference accuracy.

Please refer to the documentation here: //www.tatmou.com/help/deeplearning/quantization.html

Quantization Workflow Prerequisites can be found here:

//www.tatmou.com/help/deeplearning/ug/quantization-workflow-prerequisites.html

If you have download or installation problems, please contact Technical Support - www.tatmou.com/contact_ts

https://www.youtube.com/watch?v=jufOpBeSvHM