GPU Coder

Generate CUDA code for NVIDIA GPUs

GPU编码器™产生优化的CUDA®来自Matlab的代码®code and Simulink®楷模。生成的代码包括用于深度学习,嵌入式视觉和信号处理算法的并行部分的CUDA内核。对于高性能,生成的代码调用优化了nvidia®CUDA图书馆,包括Tensorrt™,CUDNN,袖口,Cusolver和Cublas。该代码可以作为源代码,静态库或动态库集成到项目中,并且可以为嵌入在NVIDIA Jetson™,NVIDIA Drive™和其他平台上的桌面,服务器和GPU编译。您可以使用MATLAB中的生成的CUDA来加速深度学习网络和算法的其他计算密集部分。GPU编码器允许您将手写的CUDA代码合并到您的算法中并进入生成的代码。

与嵌入式编码器一起使用时®, GPU Coder lets you verify the numerical behavior of the generated code via software-in-the-loop (SIL) and processor-in-the-loop (PIL) testing.

Get Started:

Generate Fast, Flexible CUDA Code

Generate optimized CUDA code. Deploy code royalty-free.

Deploy Algorithms Royalty-Free

Compile and run your generated code on popular NVIDIA GPUs, from desktop systems to data centers to embedded hardware. The generated code is royalty-free—deploy it in commercial applications to your customers at no charge.

Explore gallery (2 images)

GPU Coder Success Stories

Learn how engineers and scientists in a variety of industries use GPU Coder to generate CUDA code for their applications.

Airbus prototypes automated detection of defects on NVIDIA Jetson TX2.

从支持的工具箱和函数生成代码金宝app

GPU Coder generates code from a broad range of MATLAB language features that design engineers use to develop algorithms as components of larger systems. This includes hundreds of operators and functions from MATLAB and companion toolboxes.

MATLAB language and toolbox support for code generation.

Incorporate Legacy Code

使用遗留码集成功能将可信或高度优化的CUDA代码合并到MATLAB算法中以进行MATLAB测试。然后从生成的代码调用相同的CUDA代码。

Incorporating existing CUDA code into generated code.

从Simulink模型生成CUDA代码金宝app

在Simulink中创建模型并生成金宝app优化的CUDA代码。

运行模拟并为NVIDIA GPU生成优化代码

When used with Simulink Coder™, GPU Coder accelerates compute-intensive portions of MATLAB Function blocks in your Simulink models on NVIDIA GPUs. You can then generate optimized CUDA code from the Simulink model and deploy it to your NVIDIA GPU target.

Simulink model of a Sobel edge detector running on a GPU.

Deploy End-to-End Deep Learning Algorithms

Use a variety of trained deep learning networks (including ResNet-50, SegNet, and LSTM) from Deep Learning Toolbox™ in your Simulink model and deploy to NVIDIA GPUs. Generate code for preprocessing and postprocessing along with your trained deep learning networks to deploy complete algorithms.

Log Signals, Tune Parameters, and Numerically Verify Code Behavior

When used with Simulink Coder, GPU Coder enables you to log signals and tune parameters in real time using external mode simulations. Use Embedded Coder with GPU Coder to run software-in-the-loop and processor-in-the-loop tests that numerically verify the generated code matches the behavior of the simulation.

Use External Mode to log signals and tune parameters in Simulink.

Generate CUDA Code from Deep Learning Networks

Deploy trained deep learning networks with Deep Learning Toolbox.

Deploy End-to-End Deep Learning Algorithms

将各种培训的深度学习网络(包括Reset-50,SEGNET和LSTM)从深度学习工具箱到NVIDIA GPU。使用预定义的深度学习层或定义特定应用程序的自定义图层。生成用于预处理和后期后处理的代码以及您培训的深度学习网络以部署完整的算法。

生成优化的推理代码

GPU Coder generates code with a smaller footprint compared with other deep learning solutions because it only generates the code needed to run inference with your specific algorithm. The generated code calls optimized libraries, including TensorRT and cuDNN.

使用CUDNN的TITAN V GPU与VGG-16的单个图像推断。

Optimize Further Using TensorRT

Generate code that integrates with NVIDIA TensorRT, a high-performance deep learning inference optimizer and runtime. Use INT8 or FP16 data types for an additional performance boost over the standard FP32 data type.

Improving execution speed with TensorRT and INT8 data types.

深度学习Quantization

Quantize your deep learning network to reduce memory usage and increase inference performance. Analyze and visualize the tradeoff between increased performance and inference accuracy using the Deep Network Quantizer app.

Optimize the Generated Code

GPU Coder automatically optimizes the generated code. Use design patterns to further increase performance.

Minimize CPU-GPU Memory Transfers and Optimize Memory Usage

GPU Coder automatically analyzes, identifies, and partitions segments of MATLAB code to run on either the CPU or GPU. It also minimizes the number of data copies between CPU and GPU. Use profiling tools to identify other potential bottlenecks.

Profile reports identifying potential bottlenecks.

Invoke Optimized Libraries

使用GPU编码器生成的代码呼叫优化的NVIDIA CUDA库,包括图特子,CUDNN,CUSOLVER,袖口,CUBLA和推力。尽可能从MATLAB工具箱函数生成的代码映射到优化库。

Generated code calling functions in the optimized cuFFT CUDA library.

Use Design Patterns for Further Acceleration

Design patterns such as stencil processing use shared memory to improve memory bandwidth. They are applied automatically when using certain functions such as convolution. You can also manually invoke them using specific pragmas.

模板加工设计模式。

Prototype on Hardware

Get to hardware fast with automatic conversion of your algorithm to CUDA code.

Prototype on NVIDIA Jetson and DRIVE Platforms

Automate cross-compilation and deployment of generated code onto NVIDIA Jetson and DRIVE platforms using GPU Coder Support Package for NVIDIA GPUs.

NVIDIA Jetson平台上的原型设计。

Access Peripherals and Sensors from MATLAB and Generated Code

Remotely communicate with the NVIDIA target from MATLAB to acquire data from webcams and other supported peripherals for early prototyping. Deploy your algorithm along with peripheral interface code to the board for standalone execution.

Access peripherals and sensors from MATLAB and generated code.

Move from Prototyping to Production

使用GPU编码器与嵌入式编码器与生成的CUDA代码并排以交互方式跟踪MATLAB代码。验证使用循环(SIL)和循环处理器(PIL)测试的硬件上运行的生成代码的数值行为。

Interactive traceability report using GPU Coder with Embedded Coder.

Accelerate Algorithms

Generate CUDA code and compile it for use inside MATLAB and Simulink.

Accelerate Algorithms Using GPUs in MATLAB

将生成的CUDA代码作为MAX功能从MATLAB代码到速度执行,但效果会根据MATLAB代码的性质而有所不同。配置文件生成MEX函数以识别瓶颈并关注您的优化工作。

Accelerate Simulink Simulations Using NVIDIA GPUs

When used with Simulink Coder, GPU Coder accelerates compute-intensive portions of MATLAB Function blocks in your Simulink models on NVIDIA GPUs.

Latest Features

Simulink Support

生成,构建和将Simulink模型部署到NVIDIA G金宝appPU

深度学习Simulink Support

Generate, build, and deploy deep learning networks in Simulink models to NVIDIA GPUs

Persistent Variables

Create persistent memory on the GPU

小波工具箱代码生成

Generate code for FFT-based FIR filtering and Short-time Fourier transform using dwt, dwt2, modwt, and modwtmra

深度学习

为自定义图层生成代码

Multi-Input Networks

Generate code for networks that have multiple inputs

长短期内存(LSTM)网络

Generate code for convolutional LSTM and network activations

IO Block Library for NVIDIA Hardware

Access NVIDIA hardware peripherals using GPU Coder Support Package for NVIDIA GPUs

看看发行说明for details on any of these features and corresponding functions.