GPU Coder

Generate CUDA code for NVIDIA GPUs

请求一个免费的trial

Watch video

GPU编码器™产生优化的CUDA^®来自Matlab的代码^®code and Simulink^®楷模。生成的代码包括用于深度学习，嵌入式视觉和信号处理算法的并行部分的CUDA内核。对于高性能，生成的代码调用优化了nvidia^®CUDA图书馆，包括Tensorrt™，CUDNN，袖口，Cusolver和Cublas。该代码可以作为源代码，静态库或动态库集成到项目中，并且可以为嵌入在NVIDIA Jetson™，NVIDIA Drive™和其他平台上的桌面，服务器和GPU编译。您可以使用MATLAB中的生成的CUDA来加速深度学习网络和算法的其他计算密集部分。GPU编码器允许您将手写的CUDA代码合并到您的算法中并进入生成的代码。

与嵌入式编码器一起使用时^®, GPU Coder lets you verify the numerical behavior of the generated code via software-in-the-loop (SIL) and processor-in-the-loop (PIL) testing.

Get Started:

Free White Paper

Generating CUDA Code from MATLAB

Download now

Generate Fast, Flexible CUDA Code

Generate optimized CUDA code. Deploy code royalty-free.

Deploy Algorithms Royalty-Free

Compile and run your generated code on popular NVIDIA GPUs, from desktop systems to data centers to embedded hardware. The generated code is royalty-free—deploy it in commercial applications to your customers at no charge.

雾整流Algorit生成CUDA代码hm(2:22)

GPU Code Generation: The Mandelbrot Set

Explore gallery (2 images)

GPU Coder Success Stories

Learn how engineers and scientists in a variety of industries use GPU Coder to generate CUDA code for their applications.

Airbus Prototypes Aircraft Inspection Demonstrator Running on NVIDIA Jetson TX2 to Automate Detection of Defects

Airbus prototypes automated detection of defects on NVIDIA Jetson TX2.

从支持的工具箱和函数生成代码金宝app

GPU Coder generates code from a broad range of MATLAB language features that design engineers use to develop algorithms as components of larger systems. This includes hundreds of operators and functions from MATLAB and companion toolboxes.

金宝app支持工具箱和功能

MATLAB Language Features Support

MATLAB language and toolbox support for code generation.

Incorporate Legacy Code

使用遗留码集成功能将可信或高度优化的CUDA代码合并到MATLAB算法中以进行MATLAB测试。然后从生成的代码调用相同的CUDA代码。

遗留代码集成

Incorporating existing CUDA code into generated code.

从Simulink模型生成CUDA代码金宝app

在Simulink中创建模型并生成金宝app优化的CUDA代码。

运行模拟并为NVIDIA GPU生成优化代码

When used with Simulink Coder™, GPU Coder accelerates compute-intensive portions of MATLAB Function blocks in your Simulink models on NVIDIA GPUs. You can then generate optimized CUDA code from the Simulink model and deploy it to your NVIDIA GPU target.

Simulation Acceleration by Using GPU Coder

使用GPU编码器从Simulink模型生成金宝app代码

针对NVIDIA嵌入式电路板

Simulink model of a Sobel edge detector running on a GPU.

Deploy End-to-End Deep Learning Algorithms

Use a variety of trained deep learning networks (including ResNet-50, SegNet, and LSTM) from Deep Learning Toolbox™ in your Simulink model and deploy to NVIDIA GPUs. Generate code for preprocessing and postprocessing along with your trained deep learning networks to deploy complete algorithms.

Supported Networks and Layers

深度学习in Simulink Using MATLAB Function Block

3:29

深度学习in Simulink for NVIDIA GPUs: Generate CUDA Code Using GPU Coder

Log Signals, Tune Parameters, and Numerically Verify Code Behavior

When used with Simulink Coder, GPU Coder enables you to log signals and tune parameters in real time using external mode simulations. Use Embedded Coder with GPU Coder to run software-in-the-loop and processor-in-the-loop tests that numerically verify the generated code matches the behavior of the simulation.

Parameter Tuning and Signal Monitoring Using External Mode

数值等同试验

Use External Mode to log signals and tune parameters in Simulink.

Generate CUDA Code from Deep Learning Networks

Deploy trained deep learning networks with Deep Learning Toolbox.

Deploy End-to-End Deep Learning Algorithms

将各种培训的深度学习网络（包括Reset-50，SEGNET和LSTM）从深度学习工具箱到NVIDIA GPU。使用预定义的深度学习层或定义特定应用程序的自定义图层。生成用于预处理和后期后处理的代码以及您培训的深度学习网络以部署完整的算法。

Supported Networks and Layers

Real-Time Object Detection with YOLO v2 Using GPU Coder(4:24)

Code Generation for Object Detection Using YOLO v3 Deep Learning

Generation for Semantic Segmentation Network by Using U-net

5:27

如何为keras-tensorflow模型生成CUDA代码

生成优化的推理代码

GPU Coder generates code with a smaller footprint compared with other deep learning solutions because it only generates the code needed to run inference with your specific algorithm. The generated code calls optimized libraries, including TensorRT and cuDNN.

Lane Detection Optimized with GPU Coder

使用CUDNN的TITAN V GPU与VGG-16的单个图像推断。

Optimize Further Using TensorRT

Generate code that integrates with NVIDIA TensorRT, a high-performance deep learning inference optimizer and runtime. Use INT8 or FP16 data types for an additional performance boost over the standard FP32 data type.

在NVIDIA GPU的行人检测与张力(1:34)

利用NVIDIA Renstrt的深度学习预测

深度学习on Jetson AGX Xavier Using MATLAB, GPU Coder, and TensorRT(24:40)

Using MATLAB and TensorRT on NVIDIA GPUs

Improving execution speed with TensorRT and INT8 data types.

深度学习Quantization

Quantize your deep learning network to reduce memory usage and increase inference performance. Analyze and visualize the tradeoff between increased performance and inference accuracy using the Deep Network Quantizer app.

INT8 Quantization with Deep Network Quantizer

Quantization of Deep Neural Networks

什么是INT8量化，为什么它为深神经网络流行？

5:14

Deep Network Quantization and Deployment Using Deep Learning Toolbox Model Quantization Library

Optimize the Generated Code

GPU Coder automatically optimizes the generated code. Use design patterns to further increase performance.

Minimize CPU-GPU Memory Transfers and Optimize Memory Usage

GPU Coder automatically analyzes, identifies, and partitions segments of MATLAB code to run on either the CPU or GPU. It also minimizes the number of data copies between CPU and GPU. Use profiling tools to identify other potential bottlenecks.

GPU Programming Paradigm

内核创作

GPU Memory Allocation and Minimization

GPU Execution Profiling of the Generated Code

Profile reports identifying potential bottlenecks.

Invoke Optimized Libraries

使用GPU编码器生成的代码呼叫优化的NVIDIA CUDA库，包括图特子，CUDNN，CUSOLVER，袖口，CUBLA和推力。尽可能从MATLAB工具箱函数生成的代码映射到优化库。

Kernels from Library Calls

NVIDIA TensorRT

NVIDIA cuDNN

NVIDIA cuFFT

Generated code calling functions in the optimized cuFFT CUDA library.

Use Design Patterns for Further Acceleration

Design patterns such as stencil processing use shared memory to improve memory bandwidth. They are applied automatically when using certain functions such as convolution. You can also manually invoke them using specific pragmas.

Design Patterns

Stencil Processing on GPU

模板加工设计模式。

Prototype on Hardware

Get to hardware fast with automatic conversion of your algorithm to CUDA code.

Prototype on NVIDIA Jetson and DRIVE Platforms

Automate cross-compilation and deployment of generated code onto NVIDIA Jetson and DRIVE platforms using GPU Coder Support Package for NVIDIA GPUs.

NVIDIA Tegra Support from GPU Coder

NVIDIA DRIVE Support from GPU Coder

Using GPU Coder to Prototype and Deploy on NVIDIA Drive, Jetson（2:54）

NVIDIA驱动器上的语义细分

NVIDIA Jetson平台上的原型设计。

Access Peripherals and Sensors from MATLAB and Generated Code

Remotely communicate with the NVIDIA target from MATLAB to acquire data from webcams and other supported peripherals for early prototyping. Deploy your algorithm along with peripheral interface code to the board for standalone execution.

Sobel Edge Detection Using Webcam on NVIDIA Jetson

Deployment and Classification of Webcam Images on NVIDIA Jetson TX2 Platform