深度学习

Understanding and using deep learning networks

Machine Learning with Simulink and NVIDIA Jetson

以下帖子来自与GPU Coder的AI部署产品经理Bill Chou。
最新的Jetson Agx Orin在一个小包装中打包一些令人难以置信的处理能力,并打开门,以在实验室外运行更多计算密集的AI算法。正如Nvidia所指出的那样,Jetson Agx Orinis capable of delivering up to 8 times the AI performanceof the previous Jetson AGX Xavier. We were eager to try out some AI applications developed in Simulink and see how we can quickly get the AI algorithms onto the board and test it on the go.

Showing the lane following example we'll put onto Jetson AGX Orin

Users like Airbus have been using Simulink and GPU Coder to deploy AI applications onto various generations of Jetson boards to快速原型并测试其AI应用程序他们可以首先在其桌面开发人员机器上测试AI应用程序,然后将AI应用程序迁移到Jetson板上,以在实验室外使用,以便在各种条件下使用:在飞机内,在车辆中的道路上或自动驾驶水下车辆。
To illustrate this approach, we'll use a highway lane following example that processes video from a dashcam. Once we verify the AI application with the test video input, we can unhook the Jetson from our desktop developer machine, switch out the input test video for live video feeds, and take the Jetson out of the lab for additional testing.

在台式机开发人员GPU上运行车道和车辆检测模型金宝app

我们正在金宝app使用的Simulink模型采用输入视频流,检测左右车道标记以及视频框架中的车辆。它使用基于Yolo V2和Alexnet的两个深度学习网络来实现这一目标。一些预处理和后处理,包括为左右车道的绘制注释以及车辆周围的边界框,有助于完成应用程序。
我们能够通过更详细地描述的两个开箱即用示例来快速原型此应用程序。这里这里。在我们的桌面开发人员机器金宝app上运行Simulink型号,该机器配备了功能强大的NVIDIA桌面类GPU,我们看到AI应用程序运行顺利,正确地识别了车道标记和车辆。在引擎盖下方,Simulink自动识别了模金宝app型的计算密集型部分,并与NVIDIA CUDA工具包一起从CPU上卸载这些计算,并将这些计算从CPU上移到台式机GPU内核上,从而为我们提供了在输出视频中看到的平滑处理。
Next, let's focus on the deployment portion of the workflow to see how we can embed this onto the newest Jetson AGX Orin.

从Simulink模型生成CUDA代码金宝app

To generate CUDA code and deploy the AI application onto the Jetson AGX Orin, we can useGPU Coder。使用桌面模拟中的相同的simu金宝applink模型,我们需要用SDL视频输出块替换输出查看器块,以便视频将出现在Jetson Board台式机上,以便我们查看。
我们还需要为Jetson AGX Orin设置代码生成配置。在代码生成的配置参数中,我们可以在深度学习网络使用NVIDIA的CUDNN或TENSORRT之间进行选择。对于我们的Simulink模型的非深度学习部分,GPU编码器将自动集成对CUDA优化金宝app库(例如Cublas和Cufft)的调用。
We can also set the hardware configuration settings for the Jetson board, including the NVIDIA toolchain, board login/password, and build options.
配置后,我们可以开始生成代码。GPU编码器将首先自动识别Simulink模型的计算密集型部分,并将其转换为CUDA内核,该核将在GPU内核上执行以获得最佳性能。金宝appAI应用程序的其余部分将以C/C ++代码在Jetson Board的臂芯上运行。
查看生成的CUDA代码的片段,我们可以看到cudamalloc()calls to allocate memory on the GPU in preparation for running kernels on the GPU cores. We can also spotcudamemcpy()calls to move data between the CPU and GPU at the appropriate parts of the algorithms, and several CUDA kernels launches through thelaneAndVehicleD_Outputs_kernel1()laneAndVehicleD_Outputs_kernel1()呼叫。
我们还可以戳入代表2个深度学习网络的代码。查看在AI应用程序开头执行一次的Yolo V2网络的设置功能,我们可以看到它将每一层依次将每个层初始化为存储器,包括所有的权重和偏差,这些权重和偏差存储在磁盘上。
Finally, while the Simulink model and CUDA code generation settings are configured for the Jetson AGX Orin, it’s worth noting that the generated CUDA code is portable and can run on all modern NVIDIA GPUs including the Jetson & DRIVE platforms, not to mention desktop and server class GPUs.
生成CUDA代码后,GPU编码器将自动调用CUDA工具链以编译,下载和启动Jetson AGX Orin上的可执行文件。对于我们的应用程序,我们还将输入视频文件复制到Jetson板上,以作为AI应用程序的输入视频。当我们使用SDL视频块时,来自Jetson板上的处理后的输出视频将显示为Jetson板上的SDL窗口,我们可以视觉上看到输出与我们的桌面GPU模拟相同,尽管鉴于预期的较低帧量处理能力的差异。
At this point, we can unplug the Jetson AGX Orin from our host developer machine and move it out of our lab for further testing in the field. We can also take the generated CUDA code and manually integrate it into a larger application in another project by using thepackngo函数整齐地拉紧了所有必要的源代码。鉴于CUDA的架构方式,生成的CUDA代码是便携式的,可以在所有现代NVIDIA平台上运行,从台式机和服务器类GPU到嵌入式Jetson和驱动器板。

概括

在Jetson AGX Orin上运行各种AI应用程序,并看到以前的Jetson Agx Xavier的性能提升很有趣。我们上面描述的工作流程在探索和原型AI应用程序中的AI应用程序时,帮助各种用户更快地移动。与新的Jetson AGX Orin一起旋转,看看您可以在现场设计哪些类型的AI应用程序。
We'll be presenting this demo using the AGX and go through more details on this workflow at our upcoming MATLAB Expo 2022 talk:Machine Learning with Simulink and NVIDIA Jetsonon May 17, 2022. Join the session to see the workflow in action and visit the NVIDIA booth to ask more question about everything NVIDIA, including their newest board Jetson AGX OrinJetson Agx Orin
Here is the link to the lane and vehicle detection example:
To run this and other AI applications on the Jetson, you need the MATLAB CoderSupport Package for NVIDIA Jetson and NVIDIA DRIVE Platforms。Finally, the example runs on any of the recent Jetson boards, though for best performance, you'll want to grab the latestJetson Agx Orin
|
  • print
  • 发送电子邮件

注释

To leave a comment, please click这里to sign in to your MathWorks Account or create a new one.