GPU编码器

Generate CUDA code for NVIDIA GPUs

请求一个免费的trial

Request a quote

GPU CODER™生成优化的CUDA^®来自MATLAB的代码^®code and Simulink^®楷模。生成的代码包括用于深度学习，嵌入式视觉和信号处理算法的可行部分的CUDA内核。对于高性能，生成的代码调用优化了NVIDIA^®CUDA库，包括Tensorrt™，Cudnn，Cufft，Cusolver和Cublas。该代码可以作为源代码，静态库或动态库集成到您的项目中，并且可以在NVIDIA JETSON™，NVIDIA DRIVE™和其他平台上嵌入台式机，服务器和GPU。您可以在MATLAB中使用生成的CUDA来加速深度学习网络和算法的其他计算密集型部分。GPU编码器使您可以将手写的CUDA代码合并到算法中，并将其纳入生成的代码中。

当与嵌入式编码器一起使用时^®, GPU Coder lets you verify the numerical behavior of the generated code via software-in-the-loop (SIL) and processor-in-the-loop (PIL) testing.

什么是GPU编码器？

生成快速，灵活的CUDA代码

生成优化的CUDA代码。部署代码免版税。

部署算法免版税

从桌面系统到数据中心再到嵌入式硬件，在流行的NVIDIA GPU上编译并运行生成的代码。生成的代码是免版税的 - 将其在商业应用中免费提供给您的客户。

生成雾气整流算法的CUDA代码(2:22)

GPU Code Generation: The Mandelbrot Set

探索画廊（2张图像）

GPU编码器Success Stories

Learn how engineers and scientists in a variety of industries use GPU Coder to generate CUDA code for their applications.

DRASS使用Yolo V2网络部署海上光学跟踪和障碍意识系统，以在NVIDIA GPU上运行的Visual Studio应用程序中

Airbus Prototypes Aircraft Inspection Demonstrator Running on NVIDIA Jetson TX2 to Automate Detection of Defects

Airbus prototypes automated detection of defects on NVIDIA Jetson TX2.

从受支持的工具箱和功能中生成代码金宝app

GPU编码器从广泛的MATLAB语言功能中生成代码，设计工程师用来开发算法作为较大系统的组件。这包括来自MATLAB和Companion Toolbox的数百个操作员和功能。

金宝app支持工具箱和功能

MATLABLanguage Features Support

MATLABlanguage and toolbox support for code generation.

Incorporate Legacy Code

使用旧版代码集成功能将可信赖或高度优化的CUDA代码合并到MATLAB中的MATLAB算法中。然后也从生成的代码调用相同的CUDA代码。

旧版代码集成

将现有的CUDA代码合并到生成的代码中。

从Simulink模型生成CUDA代码金宝app

在Simulink中创建模型并生成金宝app优化的CUDA代码。

运行仿真并为NVIDIA GPU生成优化的代码

与Simulink Coder金宝app™一起使用时，GPU编码器会在NVIDIA GPU上的Simulink模型中加速MATLAB功能块的计算密集型部分。然后，您可以从Simulink模型生成优化的CUDA代码，并将其部署到您的NVIDIA GPU目标。金宝app

使用GPU编码器的仿真加速

使用GPU编码器从Simulink模型中生金宝app成代码

靶向NVIDIA嵌入式板

金宝app在GPU上运行的SOBEL边缘检测器的模型。

Deploy End-to-End Deep Learning Algorithms

Use a variety of trained deep learning networks (including ResNet-50, SegNet, and LSTM) from Deep Learning Toolbox™ in your Simulink model and deploy to NVIDIA GPUs. Generate code for preprocessing and postprocessing along with your trained deep learning networks to deploy complete algorithms.

金宝app支持的网络和层

Deep Learning in Simulink Using MATLAB Function Block

Deep Learning in Simulink for NVIDIA GPUs: Generate CUDA Code Using GPU Coder

日志信号，调子参数和数值验证代码行为

When used with Simulink Coder, GPU Coder enables you to log signals and tune parameters in real time using external mode simulations. Use Embedded Coder with GPU Coder to run software-in-the-loop and processor-in-the-loop tests that numerically verify the generated code matches the behavior of the simulation.

Parameter Tuning and Signal Monitoring Using External Mode

数值等效测试

NVIDIA GPU中的Simul金宝appink中的深度学习：ECG信号的分类

从深度学习网络生成CUDA代码

使用深度学习工具箱部署经过训练的深度学习网络。

Deploy End-to-End Deep Learning Algorithms

从深度学习工具箱到NVIDIA GPU部署各种训练有素的深度学习网络（包括Resnet-50，Segnet和LSTM）。使用预定义的深度学习层或为您的特定应用程序定义自定义层。生成用于预处理和后处理的代码以及训练有素的深度学习网络，以部署完整的算法。

金宝app支持的网络和层

Real-Time Object Detection with YOLO v2 Using GPU Coder（4:24）

Code Generation for Object Detection Using YOLO v3 Deep Learning

Generation for Semantic Segmentation Network by Using U-net

如何为Keras-TensorFlow模型生成CUDA代码

生成推理的优化代码

与其他深度学习解决方案相比，GPU编码器与较小的占地面积生成代码，因为它仅生成使用特定算法运行推理所需的代码。金宝搏官方网站生成的代码调用优化的库，包括Tensorrt和Cudnn。

使用GPU编码器优化的车道检测

使用Cudnn在Titan V GPU上使用VGG-16的单图像推断。

使用张力进一步优化

Generate code that integrates with NVIDIA TensorRT, a high-performance deep learning inference optimizer and runtime. Use INT8 or FP16 data types for an additional performance boost over the standard FP32 data type.

张力（1:34）

Nvidia tensorrt的深度学习预测

使用MATLAB，GPU编码器和Tensorrt对Jetson Agx Xavier进行深入学习(24:40)

Using MATLAB and TensorRT on NVIDIA GPUs

通过张力和INT8数据类型提高执行速度。

深度学习量化

量化深度学习网络以减少记忆使用情况并提高推理性能。使用深网量化应用程序分析和可视化提高性能和推理精度之间的权衡。

使用深网量化的INT 8量化

深度神经网络的量化

什么是INT8量化，为什么在深层神经网络中流行？

Deep Network Quantization and Deployment Using Deep Learning Toolbox Model Quantization Library

优化生成的代码

GPU编码器会自动优化生成的代码。使用设计模式进一步提高性能。

最小化CPU-GPU内存传输并优化内存使用情况

GPU编码器automatically analyzes, identifies, and partitions segments of MATLAB code to run on either the CPU or GPU. It also minimizes the number of data copies between CPU and GPU. Use profiling tools to identify other potential bottlenecks.

GPU编程范式

内核创建

GPU内存分配和最小化

生成代码的GPU执行分析

Profile reports identifying potential bottlenecks.

调用优化的库

使用GPU Coder生成的代码调用优化的NVIDIA CUDA库，包括Tensorrt，Cudnn，Cusolver，Cufft，Cufft，Cublas和Thust。尽可能将MATLAB工具箱功能生成的代码映射到优化的库。

Kernels from Library Calls

NVIDIA TensorRT

NVIDIA cuDNN

Nvidia Cufft

Generated code calling functions in the optimized cuFFT CUDA library.

Use Design Patterns for Further Acceleration

Design patterns such as stencil processing use shared memory to improve memory bandwidth. They are applied automatically when using certain functions such as convolution. You can also manually invoke them using specific pragmas.

设计模式

GPU上的模具处理