Gabriel Ha, MathWorks
This video accompanies a hands-on workshop introducing you to parallel computing with MATLAB®和模拟金宝app®, so that you can solve computationally and data-intensive problems using multicore processors, GPUs, and computer clusters. By working through common scenarios to parallelize MATLAB algorithms and run multiple Simulink simulations in parallel, you will gain an understanding of parallel computing with MATLAB and Simulink and learn about best practices.
随着视频,提供练习和示例,可以加强如何使用Matlab和Simulink并行计算。金宝app车间练习和示例将在简单的并行使用概念到更先进的技术难以变化。
强调
•使用并行计算加快MATLAB应用程序
•并行运行多个Simulink仿真金宝app
•GPU计算
• Offloading computations and cluster computing
• Working with large data sets
嗨,每个人,欢迎与Matlab和Simulink的并行计算的研讨会。金宝app并行计算是一个重要的主题,因为工程师和研究人员面临的问题越来越大,变得更加复杂。此外,随着技术的发展,期望增加了更快,更有效的结果。
Matlab和Simulink中的并行计算使任何部门或行业的工程师金宝app,科学家和研究人员能够利用易于使用的计算资源,而无需在并行计算的专家。以下是Matlab客户经历的一些实际性能提升的例子,这些客户使用Matlab并行计算工具加快工作。此研讨会将指导您完成同样的步骤和提示,并记住并行竞争硬件正变得越来越广泛。
多核处理器是规范,并且能够计算的GPU设备正在变得普遍。此外,访问群集和云环境已经越来越大,提供了使用超出典型工作站上可用的计算资源的能力。无论您是计划利用多核处理器,计算集群或GPU,如何优化代码,以便在引入额外计算电源时,您可以接收更好的性能改进。
Let's highlight some steps you can take to optimize your code. Before modifying your code, you need to determine where to focus your efforts. Perhaps the most critical tool to support this process is the profiler, which can help you find bottlenecks by telling you where your code is spending most of its execution time. Improving those areas gives you the biggest performance boost for your efforts.
Once you've located your areas of investment, you can use effective programming techniques like preallocation and vectorization to accelerate the execution of your MATLAB code. The MATLAB code analyzer can help advise you with this in addition to bringing issues and errors in your code to your attention. Finally, you may also obtain speed UPS by replacing parts of your MATLAB code with an automatically generated MATLAB executable, known as a MEX function. You can do this using a separate product called MATLAB Coder.
值得注意的是,即使没有并行计算工具箱,Matlab也提供了隐式的多核支持,其内置于多线程。金宝app越来越多的核心MATLAB函数利用底层的多线程库支持,而其他工具箱在使用核心MATLAB函数时利用这些优势。金宝app但是,并非每个MATLAB功能都能够是多线程,并且任何加速度都仅限于您当地的工作站。因此,并行计算工具使您可以获得超出这些限制的好处。
Put another way, Parallel Computing Toolbox box enables direct control of your parallel resources. For example, parallel constructs like parfor let control which portions of your workflow are distributed to multiple cores. Later on, we'll talk about how you can extend that level of control to resources on compute clusters by using MATLAB parallel server.
The following video clip will clearly demonstrate performance improvements obtained using parallel computing in MATLAB, specifically with parfor. We have three different scenarios where we run the same parameter sweep code in three different computing environments, a single desktop workstation, a cluster of 200 cores, and a cluster of 1,000 cores. As you saw for this problem, using 1000 cores provided a very significant speedup.
That being said, we should mention that throwing more cores at a problem doesn't always give you proportionally faster results. As a general rule of thumb, if your model or application is computationally intensive and you have a large number of independent iterations to complete, you can likely make efficient use of a large number of cores to speed up your overall execution time. Having discussed the motivation for and usefulness of parallel computing, let's talk about how to utilize it in MATLAB.
我们将首先谈论在台式计算机上使用多个核心,还将学习一些并行计算基础,如工作人员的概念以及如何实现工人的概念,以及普通循环如何工作。之后,我们将讨论使用GPU,然后缩放到群集或云环境。然后,我们将使用与大数据的并行计算进行一些提示。
MathWorks提供两个并行计算工具。我们提出了几次到并行计算工具箱,我们现在将覆盖。稍后,我们将讨论Matlab并行服务器。您许可使用并行计算工具箱的许可将与MATLAB一起安装。我们将使用Matlab客户端的术语来引用安装工具框的计算机。
工具箱可以通过利用Matlab计算引擎称为工人,您可以使用您的MultiCore处理器更高效。这些工人由您的MATLAB会话控制,并允许您使用硬件的全部潜力来加快工作流程。您可以交互方式使用工人或向其发送工作以在后台运行。工人构成基于CPU的并行工作流程的基础。
When you have a collection of workers with interprocess communication, we call that a parallel pool. You can initialize and manage a parallel pool programmatically using MATLAB code, or interactively from this icon in the MATLAB desktop environment. Parallel Computing Toolbox handles the work involved in dividing up tasks and computations and assigning them to workers in the parallel pool, thereby enabling your resources to perform parallel computing.
The behind the scenes work is all encapsulated in easy to use syntax. Sometimes as simple as just changing one word, and you never have to leave the familiarity of the MATLAB desktop environment. In general, you should not run more MATLAB workers than the number of physical cores available to your machine, otherwise you are likely to have resource contention.
Now that we've covered the basics of enabling parallel computing through workers, let's talk about what we can do with them. Some parallel constructs in MATLAB are easier to get started with but offer less control. Others require more knowledge of parallel computing but offer more granular control. We'll start with the easiest to use and work our way down. A large number of MATLAB toolboxes have automatic parallel support built into them.
如果您发现函数有并行支持,它包含在您的瓶颈中,您可以使用很少的努力加速您的代码。金宝app以下是具有自动并行支持的不同应用程序上的功能的示例。金宝app底部的链接将为您提供工具箱和功能的完整列表具有自动并行支持。金宝app同样,Simulink工具箱中的许多并行启用块集可以帮助您加快您的工作流程,从很少的努力。金宝app
For example, Simulink design optimization has one of the best integrations with Parallel Computing Toolbox. You just enable a single checkbox, use parallel pool during optimization, and it will immediately speed up workflows like sensitivity analysis, response optimization, and parameter estimation. Once again, you can use the link at the bottom to see the full list of automatic parallel support.
让我们继续前进到一个下一级。如果您的瓶颈不涉及具有自动并行支持的函数,Matlab中有大量的并行构造可提供更多控制对瘫痪和瘫痪的内容和方式的控制。金宝appMathWorks的并行计算团队正在积极添加更多构造和改进现有的构造。正如我们之前提到的那样,根据问题,并行计算并不总是给您比例改进,然而,并行计算存在理想的问题,其中计算密集的问题只是多个任务,迭代或不依赖的模拟的问题彼此相互完成计算。
Real world examples of such problems are Monte Carlo simulations, parameter sweeps, and design optimization, and the easiest way to address this challenge is to use parallel for loops. For example, let's say you want to run five iterations of your code. If you run it in a for loop, they run serially, one after the other. You wait for one to complete before moving to the next iteration. However, if they're all independent tasks with no dependencies or communication needed between individual iterations, you can distribute these tasks to separate workers and compute them in parallel at the same time.
这个课程的利用率最大化哟ur machine and gets you the results sooner. Parallel for loops are implemented using the parfor command. While requires parallel computing toolbox to be able to leverage workers for parallel processing, it will actually still run without it. That means you can share code that uses parfor with colleagues and collaborators who might not have access to Parallel Computing Toolbox.
在那种情况下,Parfol将表现出一种传统的循环,尽管具有不同的迭代顺序。在此示例中,我们希望使用多核处理器拍摄循环循环的典型串行,并使用我们的多核处理器运行。循环中的迭代不依赖于彼此,并且不需要在彼此之间传递信息。我们在这里必须做的就是将循环更改为Parcom循环,这将自动在多个工人跨越迭代运行迭代。
Parfor will automatically distribute the tasks to the available workers and collect results upon completion. When changing for to parfor, you may need to make some adjustments to your code. The code analyzer will help to guide you through this process by informing you of what changes need to be made in order to run the parfor loop.
在此示例中,未显示警告,并且不需要额外的代码更改。在具有略微不同代码的第二个例子中,代码分析仪确定存在问题并将其带来了我们的注意。以下插图将为您提供更高的洞察,进入Paret执行时发生的事情。在这个例子中,Matlab可以访问三名工人。它们被分配到运行任务,一旦工人完成了当前任务,就可以分配额外的工作。
Finally, the results are collected and can be displayed in MATLAB. When MATLAB recognizes a name in a parfor loop as a variable, the variable is classified in one of several categories shown in the table on the right. Two variable types that can have a significant impact on your runtime are slice variables and broadcast variable. A sliced variable is a variable whose value can be broken up into segments or slices, which are then operated on separately by different workers.
Each iteration of the loop works on a different slice of the array. Using slice variables can reduce the amount of required communication between a client and the workers. A broadcast variable is any variable other than the loop variable or a slice variable that does not change inside the loop. At the start of a parfor loop, the values of any broadcast variables are sent to all workers. That means that large broadcast variables can cost significant overhead in having to transfer them between the client and workers.
Therefore, optimize parfor loops by trying to use a more slice variables and keep small any necessary broadcast variables to reduce parallel overhead. Another common parallel construct is parfeval, short for parallel f eval. This parallel construct is similar to parfor loops because it utilizes parallel workers to run multiple tasks in parallel.
差异是它仅在功能上运行,并且它与MATLAB相对于MATLAB异步或非阻塞。与循环不同,ParfeVal允许您在并行工作完成后,您可以继续在MATLAB中执行命令。Parfeval创建一个任务的队列,每个任务都在并行工作者上执行函数。
The queue is such that the next item in the queue is always executed on the next available worker in the pool, thereby preserving order of execution. After tasks are queued up for execution, you are free to use MATLAB on other tasks without having to wait on the queued tasks. When they are done, you can retrieve results of the computation using fetchnext.
您还可以从队列添加或删除任务。Parfeval还以不同的方式分配给平行工人的工作,而不是Parcon。如您所见,而不是将一组任务转移到并行工人,但Parfeval一次传输一个任务。如果您的任务或迭代具有显着不同的运行时间,Parfeval将有助于避免通过分组任务引起的闲置工人。
That being said, parfor will still most likely be your go to solution, but keep parfeval in mind if need to preserve the order of execution, if you need a parallel queue, or if you would like to keep using MATLAB while it performs computations in the background. The data queue allows you to pass data from the parallel workers back to the MATLAB client. One useful application of this is being able to view the progress of your parallel computation.
要开始,我们构建数据多维数据集并创建一个权重栏,我们将用于查看我们的进度。然后,我们每次触发数据队列时都会出现哪些操作。在这种情况下,我们想要运行功能和更新权重栏,这将更新权重栏。我们使用在每个构造之后触发终端更新权重栏函数,如发送构造所示的Parcol中的每次迭代。
您还可以使用Parfeval以后使用。作为运行的Par,工人在计算完成后通知客户端。这触发了最终更新权重栏功能,该函数更新并行工人的指示进度。
The key components of this workflow are data q. After each, send in the function you provide to run after each iteration. While you can technically use parfor within Simulink, parfor was designed primarily for MATLAB and is not recommended to be used with Simulink. Instead, use parsim to run multiple simulations in parallel.
Parsim distributes multiple simulations to multicore CPUs to speed up overall simulation time. It automates the creation of parallel pools identifies file dependencies, and manages build artifacts. It also works in conjunction with a new simulation input object, which helps you setup all your simulation inputs in a convenient way, including variables, block parameters, and simulation configurations.
At this point, we've talked about automatic parallel support and common programming constructs. Parallel Computing Toolbox also offers advanced parallel constructs for the most control of your resources, such as configuring parallel workers to communicate with one another and past data, splitting large matrices across the memory of multiple machines, and working with large repositories of data. Since these topics may not have broad appeal, if you'd like to learn more, you can check the resources at the end of this presentation.
产品文档和技术支持也可以帮助您解决您的问题。金宝app并行计算工具箱使您可以使用NVIDIA GPU加速AI,深度学习和其他计算密集型分析,而无需成为[?QDA?]程序员。MATLAB具有数百个功能,支持使用NVIDIA GPU使用,您将能够在桌面金宝app或群集上访问多个GPU,生成[?QDA?]代码等等。
To use a GPU on your workstation, you simply need MATLAB and Parallel Computing Toolbox. You also need to make sure you have a supported NVIDIA GPU device with a recent graphics driver. It's best practice to ensure you are the latest driver for your device. GPUs have hundreds-- sometimes thousands-- of cores with a very focused instruction set.
In MATLAB desktop or a single worker is all that is needed to take advantage of an entire GPU. In deep learning toolbox, functions like train network can use a GPU if you set a flag and have a suitable GPU. In addition, hundreds of functions in MATLAB and other toolboxes are overloaded to use a GPU if you supply a GPU array argument, which we will cover shortly.
The link at the bottom takes you to the documentation for running MATLAB functions on a GPU, where you can also find a list of GPU supported functions. Not all problems are suited for the GPU. Ideal problems for GPU computing are massively parallel and/or computationally intensive. Massively parallel means that the computations can be broken down into hundreds or thousands of independent units of work.
你会看到当所有的最佳性能cores are kept busy, exploiting the inherent parallel nature of the GPU. Computationally intensive means that the time spent on computation significantly exceeds the amount of time spent on transferring data to and from GPU memory. GPUs have a high speed memory bus for data transfer within the GPU. However, the GPU has to use the much slower PCI express bus to communicate with the CPU.
This means that your overall computational speedup will be reduced by the amount of time it takes to transfer data between devices, as required for your algorithm MATLAB developers have written [? QDA ?] versions of key MATLAB and toolbox functions, which are presented as overloaded functions. The GPU version of a function will run when the input is in GPU memory. In the example to the right, we initially create a matrix on the CPU.
Using GPU array, we send a copy of that matrix to the GPU. We then execute an fft function on that matrix, and note that even though there is no explicit instruction to use a GPU, MATLAB will see that the matrix resides on the GPU and we use the GPU instead of the CPU to perform the computation. This means that you can use your GPU for faster computation, while still using these same underlying code. After the computation is complete, you can gather the results and view them in MATLAB as normal. You can further accelerate your code using advanced GPU [? QDA ?] and MEX programming.
您可以通过咨询产品文档和技术支持来了解更多信息。金宝app现在我们已经建立了对Matlab并行计算的基本理解,让我们讨论我们如何开始将工作流迁移到群集或云以获取更多的计算能力。毕竟,您努力的问题或挑战可能需要额外的计算资源或内存,超出单个多核桌面机器上可用的内容。
MATLABPLASSPLATER服务器使您可以扩展桌面工作流程,以访问群集中多个计算机的额外计算电源和内存,无论是在组织中的前提,还是在云上。在建立基础架构时可能需要支持您的IT人员的支持,但您可以将作业发送到群集,而不会离开MATLAB桌面金宝app环境。在MATLAB中的其他并行计算工具的相同精神中,您在桌面机器上开发的代码可以在群集中运行,而无需重新介绍您的底层算法。您可以使用群集进行额外的计算能力,或者只是为了释放您的桌面计算机以获取其他工作,我们很快就会使用批处理工作流程进行讨论。
集群规模,你需要MATLAB和帕拉llel Computing Toolbox box on your MATLAB client workstation, along with the licenses for any other tool box required by your code. On the cluster side, you only need MATLAB Parallel Server. Instead of checking out toolbox licenses, each MATLAB parallel server worker dynamically licenses toolboxes and block sets to match the licenses from the submitting client.
您可以通过使用群集配置文件在代码中或通过MATLAB UI定义以编程方式定义它来选择运行代码的位置。默认情况下,您将拥有本地配置文件,它将在Matlab客户端上运行Worker。您可以创建或导入其他配置文件,这将指向远程硬件上的工人。
您可以有多个配置文件来访问不同的群集环境,并且您可以将作业从同一MATLAB会话中提交给不同的配置文件。您甚至可以在群集中具有交互式并行池,这对于调试和原型设计非常有用。请注意,您的Matlab客户端只能一次与一个并行池交互。
Parallel computing in MATLAB supports cross platform submissions, which means that the operating system on which your MATLAB client runs can be different from the cluster operating system. Since MATLAB syntax is the same on all platforms, there is no need to rewrite your algorithms. Of course, you'll need to ensure that your code does not use hardcoded operating system specific file references Parallel Computing Toolbox includes features like additional paths and attached files that help to resolve potential issues of sharing code and data with workers on a cluster.
根据您的系统和网络配置, you can use workers on the cluster interactively with par pool, which is useful for prototyping and debugging. For long running jobs, you will want to transition to batch workflows, which we previously mentioned in passing. You can send your code using the batch command to run on remote hardware where MATLAB Parallel Server has been installed and configured.
Using batch off loads of work from your computer so that your machine is no longer tied to the computation. That means you can do something else, put your machine to sleep, or even turn it off. By default, the command will request a single worker for serial computation, but you can include the pool argument to request multiple workers for parallel computation.
You can submit multiple batch jobs and the scheduler on the cluster side will schedule work as resources become available. You can check the state and progress of your job with these states in diary commands, which can use programmatically through the MATLAB command window and interactively through the job monitor. After the job is finished, you can retrieve the results in MATLAB or view artifacts generated from the job on the file system of the cluster.
您还可以使用Simulink利用批处理。金宝appBatchsim通过将模拟卸载到远程硬件上的MATLAB并行服务器,释放您的桌面资源,与BATCHS命令类似。请注意,如果您已在代码中使用Parsim,则可以将Parsim更改为BatchSim,指定池参数,并以批次运行模拟。完成后,您可以在方便地处理桌面时检索仿真结果。
通过更完整地了解MATLAB中的并行工具,我们可以了解MATLAB如何提供单一,高性能环境,以便在桌面上或群集上使用大数据。有MATLAB功能为初学者和大数据应用的电力用户定制。您将能够使用数据存储和高阵列等构造来访问不适合内存的数据。
You'll be able to use constructs like data stores and tall arrays to access data that does not fit in memory, use data from Hadoop Distributed File System, or HDFS, access cloud based storage, and create repositories of large amounts of images, spreadsheets, and custom files. While you'll have to learn about a few more functions, they all use the same intuitive MATLAB syntax with which you're already familiar. You can prototype algorithms quickly using small data sets and then scale up using these same code to big data sets stored in and process on large clusters.
高阵列提供了一种可视化,解析和分析数据的方法,即使由数百万或数十亿的行组成,无法适合您的机器随机存取存储器或RAM。它们由数据存储备份,该数据存储是位于计算机,云或群集上的大量文件的存储库。许多操作和函数重载以使用高阵列,使用这些相同的语法,因为您将在Memory Matlab阵列中使用正常使用。高阵列围绕其中一个数据存储,并将整个数据集作为一个连续的表或数组。
The underlying data store enables calculations to work through the array one piece at a time. All this is done behind the scenes, and to you as the user, you simply write what looks like normal MATLAB code. For example, if we build a tall array from multiple CSV files containing tabular data, the resulting tall array is a tall table. Even though this table doesn't fit in memory, we can use standard table functions like summary or dot references to access the columns and then use max, min, plus, minus, just as we would for a regular, in memory table.
Since the pieces are processed independently, you can paralyze this using Parallel Computing Toolbox and process several pieces at a time. Naturally, you can scale up tall arrays across multiple computers using MATLAB Parallel Server. Distributed arrays are a parallel data type that uses the memory of multiple machines to store variables that are too large to store on a single machine.
使用分布式阵列,您可以通过多台计算机分发矩阵,并超出单个计算机的功能。您可以使用并行计算工具箱在桌面上的原型分布式阵列工作流,然后向上缩放到使用MATLAB并行服务器的群集。在同一精神的尝试使事情变得简单而简单,大量标准MATLAB功能使用与普通数组相同的语法,作为普通数组的分布式阵列,作为过载功能。
That means you can program and distribute an array algorithm in the same manner as an in-memory algorithm, and MATLAB will run the right version of the code based on the input data type. This enables you to take advantage of distributed computing without needing to be an expert in message passing. In this example, we develop and prototype an algorithm using distributed arrays on a local machine with a small data sample.
Once we are confident that our algorithm works, we need only to change our cluster profile to run the same algorithm on the entire data set and scaled up on the cluster. Using a data store, we can access multiple files, which each contain a portion of a matrix which we will use in our calculation. We use distributed with a data store to allow the data to be spread across the pool of workers in a way that allows the matrix to be processed as a single entity.
我们可以用存储在本单个文件或一组文件中的小矩阵在本地测试,然后通过更改配置文件来易于扩展以访问群集,并使用访问整个数据集的数据存储,这将包括更大的矩阵。请注意,在圆圈的代码之外,其余的代码完全相同。当我们包装时,我们也会向您留下一些资源以获取更多信息。希望,您已经能够多次看到Matlab的主题,使您能够轻松使用强大的计算资源和技术,因此您可以专注于您的算法和研究。
You don't need to be a parallel programming expert to get started, and you can always dig deeper into more advanced techniques if you want to get even more performance out of your resources. Those resources include the hardware already available to your machine, as well as additional computational power from GPUs or a cluster of machines. And whether you're developing serial or parallel algorithms, you can develop and prototype them locally using the familiar MATLAB syntax, then scale up to clusters or to the cloud without having to rewrite your underlying code.
Here are some resources regarding the topics mentioned in this presentation. Please feel free to reach out to us for any questions in regards to these topics and more. Technical support or your account manager we'll also be glad to help answer any questions you may have.
Featured Product
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:。
Select网站You can also select a web site from the following list:
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.