Main Content

Decide When to Useparfor

parfor-Loops in MATLAB

Aparfor-loop in MATLAB®executes a series of statements in the loop body in parallel. The MATLAB client issues theparforcommand and coordinates with MATLAB workers to execute the loop iterations in parallel on the workers in aparallel pool。The client sends the necessary data on whichparforoperates to workers, where most of the computation is executed. The results are sent back to the client and assembled.

Aparfor-loop can provide significantly better performance than its analogousfor-loop, because several MATLAB workers can compute simultaneously on the same loop.

Each execution of the body of aparfor-loop is aniteration。MATLAB workers evaluate iterations in no particular order and independently of each other. Because each iteration is independent, there is no guarantee that the iterations are synchronized in any way, nor is there any need for this. If the number of workers is equal to the number of loop iterations, each worker performs one iteration of the loop. If there are more iterations than workers, some workers perform more than one loop iteration; in this case, a worker might receive multiple iterations at once to reduce communication time.

决定何时使用parfor

Aparfor如果你有一个缓慢的循环可以有用for-loop. Considerparforif you have:

  • Some loop iterations that take a long time to execute. In this case, the workers can execute the long iterations simultaneously. Make sure that the number of iterations exceeds the number of workers. Otherwise, you will not use all workers available.

  • Many loop iterations of a simple calculation, such as a Monte Carlo simulation or a parameter sweep.parfordivides the loop iterations into groups so that each worker executes some portion of the total number of iterations.

Aparfor-loop might not be useful if you have:

  • Code that has vectorized out thefor-loops. Generally, if you want to make code run faster, first try to vectorize it. For details how to do this, seeVectorization。矢量化代码使您可以从许多基础MATLAB库的多线程性质提供的内置并行性中受益。但是,如果您具有矢量化代码,并且只能访问localworkers, thenparfor- 环的运行可能比for-loops. Do not devectorize code to allow forparfor; in general, this solution does not work well.

  • Loop iterations that take a short time to execute. In this case, parallel overhead dominates your calculation.

You cannot use aparfor循环当迭代循环取决于the results of other iterations. Each iteration must be independent of all others. For help dealing with independent loops, seeEnsure That parfor-Loop Iterations are Independent。The exception to this rule is to accumulate values in a loop using还原变量

In deciding when to useparfor, consider parallel overhead. Parallel overhead includes the time required for communication, coordination and data transfer — sending and receiving data — from client to workers and back. If iteration evaluations are fast, this overhead could be a significant part of the total time. Consider two different types of loop iterations:

  • for-loops with a computationally demanding task. These loops are generally good candidates for conversion into aparfor-loop, because the time needed for computation dominates the time required for data transfer.

  • for-loops with a simple computational task. These loops generally do not benefit from conversion into aparfor-loop, because the time needed for data transfer is significant compared with the time needed for computation.

Example ofparforWith Low Parallel Overhead

In this example, you start with a computationally demanding task inside afor-loop. Thefor-loops are slow, and you speed up the calculation usingparfor-loops instead.parfor拆分执行for-loop iterations over the workers in a parallel pool.

This example calculates the spectral radius of a matrix and converts afor-loop into aparfor-loop. Find out how to measure the resulting speedup and how much data is transferred to and from the workers in the parallel pool.

  1. In the MATLAB Editor, enter the followingfor-loop. Addticandtocto measure the computation time.

    tic n = 200; A = 500; a = zeros(n);fori = 1:n a(i) = max(abs(eig(rand(A))));endtoc
  2. Run the script, and note the elapsed time.

    Elapsed time is 31.935373 seconds.

  3. In the script, replace thefor-loop with aparfor-loop. AddticBytesandtocBytesto measure how much data is transferred to and from the workers in the parallel pool.

    tic ticBytes(gcp); n = 200; A = 500; a = zeros(n);parfori = 1:n a(i) = max(abs(eig(rand(A))));endtocBytes(gcp) toc

  4. Run the new script on four workers, and run it again. Note that the first run is slower than the second run, because the parallel pool takes some time to start and make the code available to the workers. Note the data transfer and elapsed time for the second run.

    By default, MATLAB automatically opens a parallel pool of workers on your local machine.

    使用“本地”配置文件...连接到4名工人。... BytesSentToWorkers BytesReceivedFromWorkers __________________ ________________________ 1 15340 7024 2 13328 5712 3 13328 5704 4 13328 5728 Total 55324 24168 Elapsed time is 10.760068 seconds.
    Theparforrun on four workers is about three times faster than the correspondingfor-loop calculation. The speed-up is smaller than the ideal speed-up of a factor of four on four workers. This is due to parallel overhead, including the time required to transfer data from the client to the workers and back. Use theticBytesandtocBytes结果以检查传输的数据量。假设数据传输所需的时间与数据大小成正比。这种近似使您可以指出数据传输所需的时间,并将平行开销与其他平行开销进行比较parfor-loop iterations. In this example, the data transfer and parallel overhead are small in comparison with the next example.

The current example has a low parallel overhead and benefits from conversion into aparfor-loop. Compare this example with the simple loop iteration in the next example, seeExample of parfor With High Parallel Overhead

对于另一个例子parfor-loop with computationally demanding tasks, seeNested parfor and for-Loops and Other parfor Requirements

Example ofparforWith High Parallel Overhead

In this example, you write a loop to create a simple sine wave. Replacing thefor-loop with aparfor-loop doesnotspeed up your calculation. This loop does not have a lot of iterations, it does not take long to execute and you do not notice an increase in execution speed. This example has a high parallel overhead and does not benefit from conversion into aparfor-loop.

  1. Write a loop to create a sine wave. Useticandtocto measure the time elapsed.

    tic n = 1024; A = zeros(n);fori = 1:n A(i,:) = (1:n) .* sin(i*2*pi/1024);endtoc
    Elapsed time is 0.012501 seconds.
  2. Replace thefor-loop with aparfor-loop. AddticBytesandtocBytesto measure how much data is transferred to and from the workers in the parallel pool.

    tic ticBytes(gcp); n = 1024; A = zeros(n);parfor(i = 1:n) A(i,:) = (1:n) .* sin(i*2*pi/1024);endtocBytes(gcp) toc

  3. Run the script on four workers and run the code again. Note that the first run is slower than the second run, because the parallel pool takes some time to start and make the code available to the workers. Note the data transfer and elapsed time for the second run.

    BytesSentToWorkers BytesReceivedFromWorkers __________________ ________________________ 1 13176 2.0615e+06 2 15188 2.0874e+06 3 13176 2.4056e+06 4 13176 1.8567e+06 Total 54716 8.4112e+06 Elapsed time is 0.743855 seconds.
    Note that the elapsed time is much smaller for the serialfor-loop than for theparfor-loop on four workers. In this case, you do not benefit from turning yourfor-loop into aparfor-loop. The reason is that the transfer of data is much greater than in the previous example, seeExample of parfor With Low Parallel Overhead。In the current example, the parallel overhead dominates the computing time. Therefore the sine wave iteration does not benefit from conversion into aparfor-loop.

This example illustrates why high parallel overhead calculations do not benefit from conversion into aparfor-loop. To learn more about speeding up your code, seeConvert for-Loops Into parfor-Loops

See Also

||

相关话题