使用GPU编码器为NVIDIA GPU实施深度学习应用程序

GPU CODER™生成可读和便携式CUDA^®MATLAB的CUDA库，例如Cublas和Cudnn^®算法，然后将其交叉编译并部署到NVIDIA^®特斯拉的GPU^®到嵌入式Jetson™平台。

本演讲的第一部分描述了MATLAB如何用于设计和原型端到端系统，其中包括使用计算机视觉算法增强的深度学习网络。您将了解MATLAB中的负担能力，以访问和管理大型数据集，以及验证的模型，以快速开始深入学习设计。然后，您将看到在培训，调试和网络验证期间使用与MATLAB集成的分布式和GPU计算功能。最后，大多数端到端系统不仅需要分类：分类之前和之后需要预处理和后处理数据。结果通常是下游控制系统的输入。这些传统的计算机视觉和控制算法（用MATLAB编写）用于与深度学习网络接口以构建端到端系统。

这个演讲的第二部分着重于embedded deployment phase. Using representative examples from automated driving to illustrate the entire workflow, see how GPU Coder automatically analyzes your MATLAB algorithm to (a) partition the MATLAB algorithm between CPU/GPU execution; (b) infer memory dependencies; (c) allocate to the GPU memory hierarchy (including global, local, shared, and constant memories); (d) minimize data transfers and device-synchronizations between CPU and GPU; and (e) finally generate CUDA code that leverages optimized CUDA libraries like cuBLAS and cuDNN to deliver high-performance.

最后，您会看到生成的代码通过基准高度优化，这些基准表明自动生成的CUDA代码的深度学习推理性能对于MXNET的速度更快〜2.5倍，Caffe2的速度更快〜5倍，而对于TensorFlow来说，caffe2的速度更快〜7倍。^®。

观看此谈话以了解如何：

1.访问和管理大图像集

2. Visualize networks and gain insight into the training process

3.导入参考网络，例如Alexnet和GoogleNet

4.自动从MATLAB算法生成便携式和优化的CUDA代码

您可以找到网络研讨会中使用的代码示例作为shipping examples对于GPU编码器。