GPU编码器

Generate CUDA code for NVIDIA GPUs

请求一个免费的trial

Watch video

GPU编码器™ generates optimized CUDA^®code from MATLAB^®code and Simulink^®models. The generated code includes CUDA kernels for parallelizable parts of your deep learning, embedded vision, and signal processing algorithms. For high performance, the generated code calls optimized NVIDIA^®CUDA libraries, including TensorRT™, cuDNN, cuFFT, cuSolver, and cuBLAS. The code can be integrated into your project as source code, static libraries, or dynamic libraries, and it can be compiled for desktops, servers, and GPUs embedded on NVIDIA Jetson™, NVIDIA DRIVE™, and other platforms. You can use the generated CUDA within MATLAB to accelerate deep learning networks and other computationally intensive portions of your algorithm. GPU Coder lets you incorporate handwritten CUDA code into your algorithms and into the generated code.

When used with Embedded Coder^®, GPU Coder lets you verify the numerical behavior of the generated code via software-in-the-loop (SIL) and processor-in-the-loop (PIL) testing.

开始：

Free White Paper

Generating CUDA Code from MATLAB

Download now

生成快速，灵活的CUDA代码

生成优化的CUDA代码。部署代码免版税。

部署算法免版税

Compile and run your generated code on popular NVIDIA GPUs, from desktop systems to data centers to embedded hardware. The generated code is royalty-free—deploy it in commercial applications to your customers at no charge.

为雾化整流算法生成CUDA代码(2:22)

GPU Code Generation: The Mandelbrot Set

探索画廊（2张图片）

GPU编码器Success Stories

Learn how engineers and scientists in a variety of industries use GPU Coder to generate CUDA code for their applications.

Airbus Prototypes Aircraft Inspection Demonstrator Running on NVIDIA Jetson TX2 to Automate Detection of Defects

Airbus prototypes automated detection of defects on NVIDIA Jetson TX2.

Generate Code from Supported Toolboxes and Functions

GPU编码器从广泛的MATLAB语言功能生成代码，该功能设计设计工程师用于开发算法作为较大系统的组件。这包括来自Matlab和Companion工具箱的数百个运营商和函数。

金宝app支持工具箱和功能

马铃薯Language Features Support

马铃薯language and toolbox support for code generation.

Incorporate Legacy Code

Use legacy code integration capabilities to incorporate trusted or highly optimized CUDA code into your MATLAB algorithms for testing in MATLAB. Then call the same CUDA code from the generated code as well.

Legacy Code Integration

将现有的CUDA代码合并到生成的代码中。

Generate CUDA Code from Simulink Models

Create models in Simulink and generate optimized CUDA code.

Run Simulations and Generate Optimized Code for NVIDIA GPUs

与Simulink Coder金宝app™一起使用时，GPU编码器在NVIDIA GPU上的Simulink模型中加速了MATLAB功能块的计算密集部分。然后，您可以从Simulink模型生成优化的CUDA代码，并将其部署到您的NVIDIA GPU目标。金宝app

使用GPU编码器仿真加速度

Code Generation from Simulink Models by Using GPU Coder

Targeting NVIDIA Embedded Boards

金宝app在GPU上运行的Sobel边缘检测器的Simulink模型。

Deploy End-to-End Deep Learning Algorithms

Use a variety of trained deep learning networks (including ResNet-50, SegNet, and LSTM) from Deep Learning Toolbox™ in your Simulink model and deploy to NVIDIA GPUs. Generate code for preprocessing and postprocessing along with your trained deep learning networks to deploy complete algorithms.

金宝app支持的网络和层

Deep Learning in Simulink Using MATLAB Function Block

3:29

Deep Learning in Simulink for NVIDIA GPUs: Generate CUDA Code Using GPU Coder

日志信号，调谐参数和数字验证代码行为

When used with Simulink Coder, GPU Coder enables you to log signals and tune parameters in real time using external mode simulations. Use Embedded Coder with GPU Coder to run software-in-the-loop and processor-in-the-loop tests that numerically verify the generated code matches the behavior of the simulation.

Parameter Tuning and Signal Monitoring Using External Mode

Numerical Equivalence Testing

使用外部模式在Simulink中记录信号和调谐参数。金宝app

Generate CUDA Code from Deep Learning Networks

部署具有深度学习工具箱的培训的深度学习网络。

Deploy End-to-End Deep Learning Algorithms

部署一个各种各样的训练有素的深度学习网络(including ResNet-50, SegNet, and LSTM) from Deep Learning Toolbox to NVIDIA GPUs. Use predefined deep learning layers or define custom layers for your specific application. Generate code for preprocessing and postprocessing along with your trained deep learning networks to deploy complete algorithms.

金宝app支持的网络和层

Real-Time Object Detection with YOLO v2 Using GPU Coder（4:24）

Code Generation for Object Detection Using YOLO v3 Deep Learning

Generation for Semantic Segmentation Network by Using U-net

5:27

How to Generate CUDA Code for a Keras-TensorFlow Model

Generate Optimized Code for Inference

GPU编码器与其他深度学习解决方案相比，GPU编码器产生具有较小占用的代码，因为它只生成使用特定算法运行推断所需的代码。金宝搏官方网站生成的代码调用优化的库，包括TensorRT和CUDNN。

Lane检测用GPU编码器进行了优化

Single image inference with VGG-16 on a Titan V GPU using cuDNN.

使用张力进一步优化

Generate code that integrates with NVIDIA TensorRT, a high-performance deep learning inference optimizer and runtime. Use INT8 or FP16 data types for an additional performance boost over the standard FP32 data type.

Pedestrian Detection on NVIDIA GPUs with TensorRT（1:34）

Deep Learning Prediction with NVIDIA TensorRT

使用MATLAB，GPU编码器和张力的Jetson Agx Xavier深入学习(24:40)

Using MATLAB and TensorRT on NVIDIA GPUs

使用TensorRT和INT8数据类型提高执行速度。

深度学习量化

量化您的深度学习网络以降低内存使用率并提高推理性能。使用Deep Network Standizer应用程序分析和可视化性能和推理准确性之间的折衷。

用深网络量化器的INT8量化

深神经网络的量化

What Is int8 Quantization and Why Is It Popular for Deep Neural Networks?

5:14

Deep Network Quantization and Deployment Using Deep Learning Toolbox Model Quantization Library

优化生成的代码

GPU编码器自动优化生成的代码。使用设计模式进一步提高性能。

最小化CPU-GPU存储器传输并优化内存使用情况

GPU编码器automatically analyzes, identifies, and partitions segments of MATLAB code to run on either the CPU or GPU. It also minimizes the number of data copies between CPU and GPU. Use profiling tools to identify other potential bottlenecks.

GPU编程范式

Kernel Creation

GPU内存分配和最小化

GPU执行生成的代码的分析

Profile reports identifying potential bottlenecks.

调用优化的库

Code generated with GPU Coder calls optimized NVIDIA CUDA libraries, including TensorRT, cuDNN, cuSolver, cuFFT, cuBLAS, and Thrust. Code generated from MATLAB toolbox functions are mapped to optimized libraries whenever possible.

Kernels from Library Calls

NVIDIA TensorRT

NVIDIA cuDNN

NVIDIA cuFFT

Generated code calling functions in the optimized cuFFT CUDA library.

Use Design Patterns for Further Acceleration

Design patterns such as stencil processing use shared memory to improve memory bandwidth. They are applied automatically when using certain functions such as convolution. You can also manually invoke them using specific pragmas.

设计模式

GPU上的模板处理

The stencil processing design pattern.

硬件原型

快速达到硬件，通过将算法的自动转换为CUDA代码。

Prototype on NVIDIA Jetson and DRIVE Platforms

Automate cross-compilation and deployment of generated code onto NVIDIA Jetson and DRIVE platforms using GPU Coder Support Package for NVIDIA GPUs.

NVIDIA Tegra Support from GPU Coder

NVIDIA驱动GPU编码金宝app器的支持

使用GPU编码器在NVIDIA Drive上的原型和部署Jetson(2:54)

Semantic Segmentation on NVIDIA DRIVE

Prototyping on the NVIDIA Jetson platform.

Access Peripherals and Sensors from MATLAB and Generated Code

远程与Matlab的NVIDIA目标通信，从网络摄像头和其他支持的外围设备获取早期原型的数据。金宝app将算法与外设接口代码一起部署到主板以进行独立执行。

Sobel Edge Detection Using Webcam on NVIDIA Jetson

Deployment and Classification of Webcam Images on NVIDIA Jetson TX2 Platform

从MATLAB和生成的代码访问外设和传感器。

从原型化到生产

Use GPU Coder with Embedded Coder to interactively trace your MATLAB code side-by-side with the generated CUDA code. Verify the numerical behavior of the generated code running on the hardware using software-in-the-loop (SIL) and processor-in-the-loop (PIL) testing.

Matlab代码和生成的CUDA代码之间的跟踪

验证生成的代码的正确性

使用GPU编码器应用程序的处理器内执行

Execution Time Profiling for PIL

使用GPU编码器具有嵌入式编码器的交互式可追溯性报告。

加速算法

Generate CUDA code and compile it for use inside MATLAB and Simulink.

加速算法Using GPUs in MATLAB

Call generated CUDA code as a MEX function from your MATLAB code to speed execution, though performance will vary depending on the nature of your MATLAB code. Profile generated MEX functions to identify bottlenecks and focus your optimization efforts.

GPU Code Generation: The Mandelbrot Set

GPU执行生成的代码的分析

Accelerate Radar Simulations on NVIDIA GPUs Using GPU Coder(3:24)

2:22

为雾化整流算法生成CUDA代码

使用NVIDIA GP金宝appU加速Simulink模拟

与Simulink编码器一起使金宝app用时，GPU编码器在NVIDIA GPU上的Simulink模型中加速了MATLAB功能块的计算密集部分。

Simulation Acceleration Using GPU Coder

GPU编码器

Generate CUDA code for NVIDIA GPUs

开始：

Generating CUDA Code from MATLAB

生成快速，灵活的CUDA代码

部署算法免版税

GPU编码器Success Stories

Generate Code from Supported Toolboxes and Functions

Incorporate Legacy Code

Generate CUDA Code from Simulink Models

Run Simulations and Generate Optimized Code for NVIDIA GPUs

Deploy End-to-End Deep Learning Algorithms

日志信号，调谐参数和数字验证代码行为

Generate CUDA Code from Deep Learning Networks

Deploy End-to-End Deep Learning Algorithms

Generate Optimized Code for Inference

使用张力进一步优化

深度学习量化

优化生成的代码

最小化CPU-GPU存储器传输并优化内存使用情况

调用优化的库

Use Design Patterns for Further Acceleration

硬件原型

Prototype on NVIDIA Jetson and DRIVE Platforms

Access Peripherals and Sensors from MATLAB and Generated Code

从原型化到生产

加速算法

加速算法Using GPUs in MATLAB

使用NVIDIA GP金宝appU加速Simulink模拟

最新特色

金宝appSimu金宝applink支持

深度学习Simulink支持金宝app金宝app

持久变量

Wavelet Toolbox Code Generation

Deep Learning

多输入网络

Long Short-Term Memory (LSTM) Networks

IO阻止NVIDIA硬件库

Additional GPU Coder Resources

对GPU编码器感兴趣？