Main Content

Code Generation for Deep Learning Networks TargetingARMMali GPUs

With GPU Coder™, you can generate optimized code for prediction of a variety of trained deep learning networks from Deep Learning Toolbox™. The generated code implements the deep convolutional neural network (CNN) by using the architecture, the layers, and parameters that you specify in the inputSeriesNetwork(Deep Learning Toolbox)orDAGNetwork(Deep Learning Toolbox)object. The code generator takes advantage of the ARM®Compute Libraryfor computer vision and machine learning. For performing deep learning on ARM Mali GPU targets, you generate code on the host development computer. Then, to build and run the executable program move the generated code to the ARM target platform. For example, HiKey960 is one of the target platforms that can execute the generated code.

Requirements

  1. Deep Learning Toolbox.

  2. Deep Learning Toolbox Model for MobileNet-v2 Network support package.

  3. GPU Coder Interface for Deep Learning Librariessupport package. To install the support packages, select the support package from the MATLAB®Add-Onsmenu.

  4. ARMCompute Libraryfor computer vision and machine learning must be installed on the target hardware. For information on the supported versions of the compilers and libraries, seeInstalling Prerequisite Products.

  5. 编译器和图书馆的环境变量ies. For more information, seeEnvironment Variables.

Load Pretrained Network

  1. Load the pretrained MobileNet-v2 network. You can choose to load a different pretrained network for image classification. If you do not have the required support packages installed, the software provides a download link.

    net = mobilenetv2;

  2. The objectnetcontains theDAGNetworkobject. Use theanalyzeNetwork(Deep Learning Toolbox)function to display an interactive visualization of the network architecture, to detect errors and issues in the network, and to display detailed information about the network layers. The layer information includes the sizes of layer activations and learnable parameters, the total number of learnable parameters, and the sizes of state parameters of recurrent layers.

    analyzeNetwork(net);

  3. The image that you want to classify must have the same size as the input size of the network. For GoogLeNet, the size of theimageInputLayer(Deep Learning Toolbox)is 224-by-224-by-3. TheClassesproperty of the outputclassificationLayer(Deep Learning Toolbox)contains the names of the classes learned by the network. View 10 random class names out of the total of 1000.

    classNames = net.Layers(end).Classes; numClasses = numel(classNames); disp(classNames(randperm(numClasses,10)))
    cock apiary soap dispenser titi car wheel guenon muzzle agaric buckeye megalith

    For more information, seeList of Deep Learning Layers(Deep Learning Toolbox).

Code Generation by Usingcnncodegen

To generate code with the ARM Compute Library, use thetargetliboption of thecnncodegencommand. Thecnncodegencommand generates C++ code for theSeriesNetworkorDAGNetworknetwork object.

  1. Callcnncodegenwith'targetlib'specified as'arm-compute-mali'. For example:

    net = googlenet; cnncodegen(net,'targetlib','arm-compute-mali','batchsize',1);

    For'arm-compute-mali', the value ofbatchsizemust be1.

    The'targetparams'name-value pair arguments that enable you to specify Library-specific parameters for the ARM Compute Library is not applicable when targeting ARM Mali GPUs.

  2. Thecnncodegencommand generates code, a makefile,cnnbuild_rtw.mk, and other supporting files to build the generated code on the target hardware. The command places all the generated files in thecodegenfolder.

  3. Write a C++ main function that callspredict. For an example main file that interfaces with the generated code, seeDeep Learning Prediction on ARM Mali GPU

  4. Move the generatedcodegenfolder and other files from the host development computer to the ARM hardware by using your preferred Secure File Copy (SCP) and Secure Shell (SSH) client. Build the executable program on the target.

Generated Code

The DAG network is generated as a C++ class (CnnMain) containing an array of 103 layer classes. The code generator reduces the number of layers is by layer fusion optimization of convolutional and batch normalization layers. A snippet of the class declaration fromcnn_exec.hppfile is shown.

cnn_exec.hppFile

  • Thesetup()method of the class sets up handles and allocates memory for each layer of the network object.

  • Thepredict()method invokes prediction for each of the 103 layers in the network.

  • Thecnn_exec.cppfile contains the definitions of the object functions for theCnnMainclass.

Binary files are exported for layers with parameters such as fully connected and convolution layers in the network. For instance, filescnn_CnnMain_Conv*_wandcnn_CnnMain_Conv*_bcorrespond to weights and bias parameters for theconvolutionallayers in the network. The code generator places these binary files in thecodegenfolder. The code generator builds the library filecnnbuildand places all the generated files in thecodegenfolder.

Limitations

  • Code generation for the ARM Mali GPU is not supported for a 2-D grouped convolution layer that has theNumGroupsproperty set as'channel-wise'or a value greater than two.

See Also

Functions

Related Topics