Preparing to Run Code in Parallel

著者Loren Shure,April 15, 2021

35 ビュー (過去 30 日間) |0いいね|9 コメント

            In a
            recent post
            , I talked about
            for
            -loops in MATLAB and how to optimize their use knowing how MATLAB stores arrays in memory. Today I want to talk about getting ready for parallel computation, specifically using parallel
            for
            -loops, via
            parfor
            .
           

            En route to creating code suitable for running in parallel, sometimes we take code with a
            for
            -loop and simply replace it with a parallel loop, using
            parfor
            . That is, if we can't vectorize the code well first. This transformation from
            for->parfor
            works really well sometimes, but does not always work, and for very good reasons.
           

            For example, you can't simply replace
            for
            with
            parfor
            if loop iterations are not completely independent. You can find out more conditions
            here
            . There is a notable exception to this rule,
            reduction variables
            , (a
            reduction variable
            accumulates a value that depends on all the iterations together, but is independent of the iteration order). The documentation for this is great. Here's a quick pseudo-code that would parallelize just fine, even though we reuse the variable
            s
            在right and left-hand sides. As long as the result is not numerically sensitive to the order in which the accumulation occurs, you will get the "right" answer here whether you use a
            for
            or a
            parfor
            loop.
           

             s = 0;
            

             parforind = 1:100
            

             s = s + fun(data(ind));
            

end

Testing for Parallelizability

            I doubt parallelizability is word, but I think you know what I mean!
           

            One simple way to see if your loop might be suitable for parallelization is to try running the loop in another order to see if you get the same result. If so, it is a candidate for
            parfor
            . The easiest version of this might be running the loop backwards. If you get the same results as the standard sweep, you should be good to go. Instead of backwards, you can also try replacing the loop with the counter in random order; this can be achieved easily using
            randperm
            .
           

            Note: clearly this calculation can be vectorized easily. That's not my focus here.
           

              k = randn(100,1);
             

              meank = mean(k);
             

              tol = sqrt(eps);
             

              qforward = zeros(size(k));
             

              forind = 1:100
             

              qforward(ind) = k(ind) - meank;
             

end

              meank = mean(k);
             

              qbackward = zeros(size(k));
             

              forind = 100:-1:1
             

              qbackward(印第安纳州)= k(在d) - meank;
             

end

              meank = mean(k);
             

              qrandom = zeros(size(k));
             

              forind = randperm(100)
             

              qrandom(ind) = k(ind) - meank;
             

end

              aretheyequal = isequal(qforward, qbackward, qrandom)
             

                 aretheyequal =logical1

            In case the answers aren't exactly equal, you could compare them in a way similar to this.
           

              nearlyfb = norm(qforward-qbackward) < tol
             

                 nearlyfb =logical1

              nearlyfr = norm(qforward-qrandom) < tol
             

                 nearlyfr =logical1

              nearlybr = norm(qbackward-qrandom) < tol
             

                 nearlybr =logical1

            So this calculation is ready to try with
            parfor
            .
           

              qpar = zeros(size(k));
             

              forind = 1:100
             

              qpar(ind) = k(ind) - meank;
             

end

              aretheyequal = isequal(qpar, qforward)
             

                 aretheyequal =logical1

What if I Have an Expression with Iteration Dependency

            You may not always be able to parallelize your code. However, there are some conditions under which you can convert your code to take advantage of parallelization by refactoring your calculations a little Let me show you an example. In this case, I will do some of the accumulation after the loop is complete.
           

             %% The Code Analyzer gives this first chunk of code two red error messages.
            

             x = [17 zeros(1,99)];
            

             k = [0 randn(1,99)];
            

             parforind = 2:100
            

             x(ind) = x(ind-1) + k(ind)^2;
            

end

             %% the error messages from above code:
            

             ThePARFOR loop cannot run due to the way variable 'x' is used.
            

             Ina PARFOR loop, variable'x' is indexed in different ways, potentially
            

             causingdependencies between iterations.
            

            Here's the
            for
            loop equivalent, no warning.
           

              x = [17 zeros(1,99)];
             

              k = [0 randn(1,99)];
             

              forind = 2:100
             

              x(ind) = x(ind-1) + k(ind)^2;
             

end

              xx = x;
             

            And here's how you might parallelize it
           

              x = [17 zeros(1,99)];
             

              tmp = zeros(1,100);
             

              parforind = 2:100
             

              tmp(ind) = k(ind)^2;
             

end

              x = cumsum(x+tmp);
             

              aretheyequal = isequal(x,xx)
             

                 aretheyequal =logical1

Thoughts?

            Do you often work with large datasets or simulations where getting access to hardware suitable for running code in parallel helps your work scale? Did your code fall into any of the simple categories I mentioned in this post? I would love to hear from you about any issues you have wrangling your code into a form suitable for parallelization. Post your thoughts
            here
            .
           

Run in your browser