Deep Learning Prediction by Using NVIDIA TensorRT
This example shows code generation for a deep learning application by using the NVIDIA TensorRT™ library. It uses thecodegen
command to generate a MEX file to perform prediction with a ResNet-50 image classification network by using TensorRT. A second example demonstrates usage ofcodegen
command to generate a MEX file that performs 8-bit integer prediction by using TensorRT for a logo classification network.
Third-Party Prerequisites
Required
This example generates CUDA® MEX and has the following third-party requirements.
CUDA enabled NVIDIA® GPU and compatible driver.
Optional
For non-MEX builds such as static, dynamic libraries or executables, this example has the following additional requirements.
NVIDIA toolkit.
NVIDIA cuDNN and TensorRT library.
Environment variables for the compilers and libraries. For more information, seeThird-Party Hardware(GPU Coder)andSetting Up the Prerequisite Products(GPU Coder).
Verify GPU Environment
Use thecoder.checkGpuInstall
(GPU Coder)function to verify that the compilers and libraries necessary for running this example are set up correctly.
envCfg = coder.gpuEnvConfig('host'); envCfg.DeepLibTarget ='tensorrt';envCfg.DeepCodegen = 1; envCfg.Quiet = 1; coder.checkGpuInstall(envCfg);
Theresnet_predict
Entry-Point Function
This example uses the DAG network ResNet-50 to show image classification by using TensorRT. A pretrained ResNet-50 model for MATLAB® is available in the ResNet-50 support package of Deep Learning Toolbox. To download and install the support package, use the Add-On Explorer.
Theresnet_predict.m
function loads the ResNet-50 network into a persistent network object and reuses the persistent object on subsequent prediction calls.
type('resnet_predict.m')
% 2020年版权MathWorks公司功能= resnet_predict(in) %#codegen % A persistent object mynet is used to load the series network object. At % the first call to this function, the persistent object is constructed and % setup. When the function is called subsequent times, the same object is % reused to call predict on inputs, avoiding reconstructing and reloading % the network object. persistent mynet; if isempty(mynet) % Call the function resnet50 that returns a DAG network % for ResNet-50 model. mynet = coder.loadDeepLearningNetwork('resnet50','resnet'); end % pass in input out = mynet.predict(in);
Run MEX Code Generation
To generate CUDA code for theresnet_predict
entry-point function, create a GPU code configuration object for a MEX target and set the target language to C++. Use thecoder.DeepLearningConfig
(GPU Coder)function to create aTensorRT
deep learning configuration object and assign it to theDeepLearningConfig
property of the GPU code configuration object. Run thecodegen
command specifying an input size of [224,224,3]. This value corresponds to the input layer size of ResNet-50 network.
cfg = coder.gpuConfig('mex'); cfg.TargetLang ='C++';cfg.DeepLearningConfig = coder.DeepLearningConfig('tensorrt'); codegen-configcfgresnet_predict-args{ones(224,224,3)}-report
Code generation successful: View report
Perform Prediction on Test Image
im = imread('peppers.png'); im = imresize(im, [224,224]); predict_scores = resnet_predict_mex(double(im));%% get top 5 probability scores and their labels[val,indx] = sort(predict_scores,'descend'); scores = val(1:5)*100; net = resnet50; classnames = net.Layers(end).ClassNames; labels = classnames(indx(1:5));
Clear the static network object that was loaded in memory.
clearmex;
Generate TensorRT Code for INT8 Prediction
Generate TensorRT code that runs inference in int8 precision. Use a pretrained logo classification network to classify logos in images. Download the pretrainedLogoNet
网络和保存它logonet.mat
file. The network was developed in MATLAB. This network can recognize 32 logos under various lighting conditions and camera angles. The network is pretrained in single precision floating-point format.
net = getLogonet();
Code generation by using the NVIDIA TensorRT Library with inference computation in 8-bit integer precision supports these additional networks:
Object detector networks such as YOLOv2 and SSD.
Regression and semantic segmentation networks.
TensorRT requires a calibration data set to calibrate a network that is trained in floating-point to compute inference in 8-bit integer precision. Set the data type to int8 and the path to the calibration data set by using theDeepLearningConfig
.logos_dataset
is a subfolder containing images grouped by their corresponding classification labels. For int8 support, GPU compute capability must be 6.1 or higher.
Note:For semantic segmentation networks, the calibration data images must be of a format supported by theimread
function.
unzip('logos_dataset.zip'); cfg = coder.gpuConfig('mex'); cfg.TargetLang ='C++';cfg.GpuConfig.ComputeCapability ='6.1';cfg.DeepLearningConfig = coder.DeepLearningConfig('tensorrt'); cfg.DeepLearningConfig.DataType ='int8';cfg.DeepLearningConfig.DataPath ='logos_dataset';cfg.DeepLearningConfig.NumCalibrationBatches = 50; codegen-configcfglogonet_predict-args{ones(227,227,3,'int8')}-report
Code generation successful: View report
Run INT8 Prediction on Test Image
im = imread('gpucoder_tensorrt_test.png'); im = imresize(im, [227,227]); predict_scores = logonet_predict_mex(int8(im));%% get top 5 probability scores and their labels[val,indx] = sort(predict_scores,'descend'); scores = val(1:5)*100; classnames = net.Layers(end).ClassNames; labels = classnames(indx(1:5));
Clear the static network object that was loaded in memory.
clearmex;