运行在多个gpu MATLAB函数

Open Live Script

This example shows how to run MATLAB code on multiple GPUs in parallel, first on your local machine, then scaling up to a cluster. As a sample problem, the example uses the logistic map, an equation that models the growth of a population.

A growing number of features in MATLAB offer automatic parallel support, including multi-gpu support, without requiring any extra coding. For details, seeRun MATLAB Functions with Automatic Parallel Support. For example, thetrainNetworkfunction offers multi-gpu support for training of neural networks and inference. For more information, seegpu的并行扩展深度学习,,和我n the Cloud(Deep Learning Toolbox).

Use a Single GPU

To run computations on a single GPU, usegpuArrayobjects as inputs to GPU-enabled MATLAB functions. To learn more about GPU-enabled functions, seeRun MATLAB Functions on a GPU.

Create gpuArrays for the growth rate,r, and the population,x. For more information on creating gpuArrays, seeEstablish Arrays on a GPU.

N = 1000; r = gpuArray.linspace(0,4,N); x = rand(1,N,'gpuArray');

Use a simple algorithm to iterate the logistic map. Because the algorithm uses GPU-enabled operators on gpuArrays, the computations run on the GPU.

numIterations = 1000;forn=1:numIterations x = r.*x.*(1-x);end

When the computations are done, plot the growth rate against the population.

plot(r,x,'.');

If you need more performance, gpuArrays supports several options. For a list, see thegpuArrayfunction page. For example, the algorithm in this example only performs element-wise operations on gpuArrays, and so you can use thearrayfunfunction to precompile them for GPU.

Use Multiple GPUs with`parfor`

You can useparfor-loops to distributefor-loop iterations among parallel workers. If your computations use GPU-enabled functions, then the computations run on the GPU of the worker. For example, if you use the Monte Carlo method to randomly simulate the evolution of populations, simulations are computed with multiple GPUs in parallel using aparfor-loop.

Create a parallel pool with as many workers as GPUs available. To determine the number of GPUs available, use thegpuDeviceCountfunction. By default, MATLAB assigns a different GPU to each worker for best performance. For more information on selecting GPUs in a parallel pool, seeUse Multiple GPUs in Parallel Pool.

numGPUs = gpuDeviceCount("available"); parpool(numGPUs);

Starting parallel pool (parpool) using the 'local' profile ... connected to 2 workers.

Define the number of simulations, and create an array in the GPU to store the population vector for each simulation.

numSimulations = 100; X = zeros(numSimulations,N,'gpuArray');

Use aparforloop to distribute simulations to workers in the pool. The code inside the loop creates a random gpuArray for the initial population, and iterates the logistic map on it. Because the code uses GPU-enabled operators on gpuArrays, the computations automatically run on the GPU of the worker.

parfori = 1:numSimulations X(i,:) = rand(1,N,'gpuArray');forn=1:numIterations X(i,:) = r.*X(i,:).*(1-X(i,:));endend

When the computations are done, plot the results of all simulations. Each color represents a different simulation.

figure plot(r,X,'.');

If you need greater control over your calculations, you can use more advanced parallel functionality. For example, you can use aparallel.pool.DataQueueto send data from the workers during computations. For an example, seePlot During Parameter Sweep with parfor.

If you want to generate a reproducible set of random numbers, you can control the random number generation on the worker GPU. For more information, seeControl Random Number Streams on Workers.

Use Multiple GPUs Asynchronously with`parfeval`

You can useparfevalto run computations asynchronously on parallel pool workers. If your computations use GPU-enabled functions, then the computations run on the GPU of the worker. As an example, you run Monte Carlo simulations on multiple GPUs asynchronously.

To hold the results of computations after the workers complete them, use future objects. Preallocate an array of future objects for the result of each simulation.

f(numSimulations) = parallel.FevalFuture;

To run computations withparfeval, you must place them inside a function. For example,myParallelFcncontains the code of a single simulation.

typemyParallelFcn

function x = myParallelFcn(r) N = 1000; x = gpuArray.rand(1,N); numIterations = 1000; for n=1:numIterations x = r.*x.*(1-x); end end

Use aforloop to loop over simulations, and useparfevalto run them asynchronously on a worker in the parallel pool.myParallelFcnuses GPU-enabled functions on gpuArrays, so they run on the GPU of the worker. Becauseparfevalperforms the computations asynchronously, it does not block MATLAB, and you can continue working while computations happen.

fori=1:numSimulations f(i) = parfeval(@myParallelFcn,1,r);end

To collect the results fromparfevalwhen they are ready, you can usefetchOutputsorfetchNexton the future objects. Also, you can useafterEachorafterAllto invoke functions on the results automatically when they are ready. For example, to plot the result of each simulation immediately after it completes, useafterEachon the future objects. Each color represents a different simulation.

figure holdonafterEach(f,@(x) plot(r,x,'.'),0);

Use Multiple GPUs in a Cluster

If you have access to a cluster with multiple GPUs, then you can scale up your computations. Use theparpoolfunction to start a parallel pool on the cluster. When you do so, parallel features, such asparforloops orparfeval, run on the cluster workers. If your computations use GPU-enabled functions on gpuArrays, then those functions run on the GPU of the cluster worker. To learn more about running parallel features on a cluster, seeScale Up from Desktop to Cluster.

Advanced Support for Fast Multi-Node GPU Communication

Some multi-GPU features in MATLAB^®, includingtrainNetwork, are optimized for direct communication via fast interconnects for improved performance.

If you have appropriate hardware connections, then data transfer between multiple GPUs uses fast peer-to-peer communication, including NVLink, if available.

If you are using a Linux compute cluster with fast interconnects between machines such as Infiniband, or fast interconnects between GPUs on different machines, such as GPUDirect RDMA, you might be able to take advantage of fast multi-node support in MATLAB. Enable this support on all the workers in your pool by setting the environment variablePARALLEL_SERVER_FAST_MULTINODE_GPU_COMMUNICATIONto1. Set this environment variable in the Cluster Profile Manager.

This feature is part of the NVIDIA NCCL library for GPU communication. To configure it, you must set additional environment variables to define the network interface protocol, especiallyNCCL_SOCKET_IFNAME. For more information, see theNCCL documentationand in particular the section onNCCL Environment Variables.