Simulation Acceleration Using MATLAB Coder and Parallel Computing Toolbox

This example uses:

Open Live Script

This example shows two ways to accelerate the simulation of communications algorithms in MATLAB®. It showcases the runtime performance effects of using MATLAB to C code generation and parallel processing runs (using the MATLABparfor(Parallel Computing Toolbox)function). For a comprehensive look at all possible acceleration techniques, seeAccelerating MATLAB Algorithms and Applicationsarticle.

使用这些方法的综合效果可能会加快典型的仿真时间数量级。区别在于在一夜之间或短短几个小时内运行模拟。

To run the MATLAB to C code generation section of this example, you must have MATLAB Coder™ product. To run the parallel processing section of this example, you must have Parallel Computing Toolbox™ product.

example Structure

该示例检查了MATLAB中此收发器系统的各种实现。

This system is composed of a transmitter, a channel model, and a receiver. The transmitter processes the input bit stream with a convolutional encoder, an interleaver, a modulator, and a MIMO space-time block encoder (see [1], [2]). The transmitted signal is then processed by a 2x2 MIMO block fading channel and an additive white gaussian noise (AWGN) channel. The receiver processes its input signal with a 2x2 MIMO space-time block decoder, a demodulator, a deinterleaver, and a Viterbi decoder to recover the best estimate of the input bit stream at the receiver.

这example follows this workflow:

Create a function that runs the simulation algorithms
使用Matlab Profiler GUI识别速度瓶颈
Accelerate the simulation with MATLAB to C code generation
Achieve even faster simulation using parallel processing runs

Create Function that Runs Simulation Algorithms

从表示该算法的第一个版本或基线实现的函数开始。输入到helperAccelBaseline功能是 $e_{b} / n_{o}$ value of the current frame (ebno），最小错误数（minNumErr）和处理的最大位数（maxNumBits). $e_{b} / n_{o}$ 是每位能量与噪声功率频谱密度的比率。功能输出是每个功能的位错误率（BER）信息 $e_{b} / n_{o}$ point.

typehelperAccelBaseline

function ber = helperAccelBaseline(EbNo, minNumErr, maxNumBits) %helperAccelBaseline Simulate a communications link % BER = helperAccelBaseline(EBNO,MINERR,MAXBIT) returns the bit error % rate (BER) of a communications link that includes convolutional coding, % interleaving, QAM modulation, an Alamouti space-time block code, and a % MIMO block fading channel with AWGN. EBNO is the energy per bit to % noise power spectral density ratio (Eb/No) of the AWGN channel in dB, % MINERR is the minimum number of errors to collect, and MAXBIT is the % maximum number of simulated bits so that the simulations do not run % indefinitely if the Eb/No value is too high. % Copyright 2011-2021 The MathWorks, Inc. M = 16; % Modulation Order k = log2(M); % Bits per Symbol codeRate = 1/2; % Coding Rate adjSNR = convertSNR(EbNo,"ebno","BitsPerSymbol",k,"CodingRate",codeRate); trellis = poly2trellis(7,[171 133]); tblen = 32; dataFrameLen = 1998; % Add 6 zeros to terminate the convolutional code chanFrameLen=(dataFrameLen+6)/codeRate; permvec=[1:3:chanFrameLen 2:3:chanFrameLen 3:3:chanFrameLen]'; ostbcEnc = comm.OSTBCEncoder(NumTransmitAntennas=2); ostbcComb = comm.OSTBCCombiner(NumTransmitAntennas=2,NumReceiveAntennas=2); mimoChan = comm.MIMOChannel(MaximumDopplerShift=0,PathGainsOutputPort=true); berCalc = comm.ErrorRate; % Run Simulation ber = zeros(3,1); while (ber(3) <= maxNumBits) && (ber(2) < minNumErr) data = [randi([0 1],dataFrameLen,1);false(6,1)]; encOut = convenc(data,trellis); % Convolutional Encoder intOut = intrlv(double(encOut),permvec'); % Interleaver modOut = qammod(intOut,M,... 'InputType','bit'); % QAM Modulator stbcOut = ostbcEnc(modOut); % Alamouti Space-Time Block Encoder [chanOut, pathGains] = mimoChan(stbcOut); % 2x2 MIMO Channel chEst = squeeze(sum(pathGains,2)); rcvd = awgn(chanOut,adjSNR,'measured'); % AWGN channel stbcDec = ostbcComb(rcvd,chEst); % Alamouti Space-Time Block Decoder demodOut = qamdemod(stbcDec,M,... 'OutputType','bit'); % QAM Demodulator deintOut = deintrlv(demodOut,permvec'); % Deinterleaver decOut = vitdec(deintOut(:),trellis, ... % Viterbi Decoder tblen,'term','hard'); ber = berCalc(decOut(1:dataFrameLen),data(1:dataFrameLen)); end

As a starting point, measure the time it takes to run this baseline algorithm in MATLAB. Use the MATLAB timing functions (ticandtoc）记录经过的运行时间以完成迭代的前面处理 $e_{b} / n_{o}$ values from 0 to 7 dB.

minebnodb = 0;maxebnodb = 7;EBNOVEC = minebnodb：maxebnodb;Minnumerr = 100;maxNumbits = 1E6;n = 1;str ='Baseline';％运行该功能一次将其加载到内存中并从% runtime measurementsHelperaccelbaseline（3,10,1E4）;berbaseline =零（size（minebnodb：maxebnodb））;disp（'Processing the baseline algorithm.');

Processing the baseline algorithm.

tic;forebnoIdx=1:length(EbNoVec) EbNo = EbNoVec(EbNoIdx); y=helperAccelBaseline(EbNo,minNumErr,maxNumBits); berBaseline(EbNoIdx)=y(1);endrtbaseline = toc;

这result shows the simulation time (in seconds) of the baseline algorithm. Use this timing measurement to compare with subsequent accelerated simulation runtimes.

helperAccelReportResults(N,rtBaseline,rtBaseline,str,str);

--------------------------------------------------------------------------------------------------------------------------------------------------------------------- |经过的时间（SEC）|加速度比率1.基线|5.5712 |1。0000 ----------------------------------------------------------------------------------------------

Identify Speed Bottlenecks by Using MATLAB Profiler App

Identify the processing bottlenecks and problem areas of the baseline algorithm by using the MATLAB Profiler. Obtain the profiler information by executing the following script:

轮廓ony=helperAccelBaseline(6,100,1e6); profileoff轮廓viewer

这Profiler reportpresents the execution time for each function call of the algorithm. You can sort the functions according to their self-time in a descending order. The first few functions that the Profiler window depicts represent the speed bottleneck of the algorithm. In this case, thevitdec功能被确定为主要速度瓶颈。

Accelerate Simulation with MATLAB to C Code Generation

MATLAB Coder generates portable and readable C code from algorithms that are part of the MATLAB code generation subset. You can create a MATLAB executable (MEX) of thehelperAccelBaseline，，，，function because it uses functions and System objects that support code generation. Use the代码根(MATLAB Coder)function to compile thehelperAccelBaselinefunction into a MEX function. After successful code generation by codegen, you will see a MEX file in the workspace that appends '_mex' to the function,helperAccelBaseline_mex。

Codegen（Codegen）（'helperAccelBaseline.m'，，，，'-args'，{ebno，minnumerr，maxnumbits}）

Code generation successful.

Measure the simulation time for the MEX version of the algorithm. Record the elapsed time for running this function in the same for-loop as before.

n=N+1; str='MATLAB to C code generation';标签='Codegen';Helperaccelbaseline_mex（3,10,1E4）;berCodegen=zeros(size(berBaseline)); disp('Processing the MEX function of the algorithm.');

处理算法的MEX功能。

tic;forebnoIdx=1:length(EbNoVec) EbNo = EbNoVec(EbNoIdx); y=helperAccelBaseline_mex(EbNo,minNumErr,maxNumBits); berCodegen(EbNoIdx)=y(1);endrt=toc;

此处的结果显示，该算法的MEX版本比算法的基线版本更快。达到的加速度量取决于算法的性质。确定加速度的最佳方法是使用MATLAB编码器生成MEX功能并进行快速测试。如果您的算法包含单位数据类型，固定点数据类型，带有状态的循环或无法矢量化的代码，则可能会看到加速。另一方面，如果您的算法包含MATLAB隐式多线程计算，例如fftandSVD，调用IPP或BLAS库的功能，在PC上在MATLAB中进行了优化的函数，例如FFTS或算法，您可以在其中矢量化代码，加速的可能性较小。

Helperaccelreportresults（n，rtbaseline，rt，str，tag）;

--------------------------------------------------------------------------------------------------------------------------------------------------------------------- |经过的时间（SEC）|加速度比率1.基线|5.5712 |1.0000 2. MATLAB到C代码生成|1.6952 |3.2864 --------------------------------------------------------------------------------------------------------------------------------------------------------

Achieve Even Faster Simulation Using Parallel Processing Runs

Utilize multiple cores to increase simulation acceleration by running tasks in parallel. Use parallel processing runs (parfor循环）在MATLAB中进行有关可用工人数量的工作。并行计算工具箱使您可以并行运行模拟的不同迭代。使用gcp(Parallel Computing Toolbox)function to get the current parallel pool. If a pool is available but not open, thegcpopens the pool and reserves several MATLAB workers to execute iterations of a subsequentparfor-loop. In this example, six workers run locally on a MATLAB client machine.

池= GCP

Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6). pool = ProcessPool with properties: Connected: true NumWorkers: 6 Busy: false Cluster: local AttachedFiles: {} AutoAddClientPath: true IdleTimeout: 30 minutes (30 minutes remaining) SpmdEnabled: true

跑Parallel Over Eb/No Values

跑 $e_{b} / n_{o}$ points in parallel using six workers using aparfor- 在以前的情况下使用的是环而不是循环。测量模拟时间。

n=N+1; str='Parallel runs with parfor over Eb/No';标签='Parfor Eb/No';Helperaccelbaseline_mex（3,10,1E4）;berparfor1 =零（size（berbaseline））;disp（'Processing the MEX function of the algorithm within a parfor-loop.');

处理Parfor-Loop中算法的MEX功能。

tic;parforebnoIdx=1:length(EbNoVec) EbNo = EbNoVec(EbNoIdx); y=helperAccelBaseline_mex(EbNo,minNumErr,maxNumBits); berParfor1(EbNoIdx)=y(1);endrt=toc;

结果添加了算法的MEX版本的仿真时间parfor- 循环到先前的结果。请注意，通过在一个内运行算法parfor-loop, the elapsed time to complete the simulation is shorter. The basic concept of aparfor- 环与标准MATLAB前循环相同。区别在于parfordivides the loop iterations into groups so that each worker executes some portion of the total number of iterations. Because several MATLAB workers can be computing concurrently on the same loop, aparfor-loop provides significantly better performance than a normal serial for-loop.

Helperaccelreportresults（n，rtbaseline，rt，str，tag）;

--------------------------------------------------------------------------------------------------------------------------------------------------------------------- |经过的时间（SEC）|加速度比率1.基线|5.5712 |1.0000 2. MATLAB到C代码生成|1.6952 |3.2864 3. Parallel runs with parfor over Eb/No | 1.4367 | 3.8779 ----------------------------------------------------------------------------------------------

跑Parallel Over Number of Bits

In the previous section, the total simulation time is mainly determined by the highest $e_{b} / n_{o}$ point. You can further accelerate the simulations by dividing up the number of bits simulated for each $e_{b} / n_{o}$ 指向工人。每个运行 $e_{b} / n_{o}$ point in parallel using six workers using aparfor-loop. Measure the simulation time.

n=N+1; str='Parallel runs with parfor over number of bits';标签='parfor＃bits';Helperaccelbaseline_mex（3,10,1E4）;berparfor2 =零（size（berbaseline））;disp（'Processing the MEX function of the second version of the algorithm within a parfor-loop.');

处理PARFOR-LOOP中第二版算法的MEX函数。

tic;% Calculate number of bits to be simulated on each workerminNumErrPerWorker = minNumErr / pool.NumWorkers; maxNumBitsPerWorker = maxNumBits / pool.NumWorkers;forebnoIdx=1:length(EbNoVec) EbNo = EbNoVec(EbNoIdx); numErr = zeros(pool.NumWorkers,1);parforw=1:pool.NumWorkers y=helperAccelBaseline_mex(EbNo,minNumErrPerWorker,maxNumBitsPerWorker); numErr(w)=y(2); numBits(w)=y(3);endberparfor2（ebnoidx）= sum（numerr）/sum（numbits）;endrt=toc;

结果添加了算法的MEX版本的仿真时间parfor-loop where this time each worker simulates the same $e_{b} / n_{o}$ point. Note that by running this version within aparfor-loop we get the fastest simulation performance. The difference is thatparfordivides the number of bits that needs to be simulated over the workers. This approach reduces the simulation time of even the highest $e_{b} / n_{o}$ 通过在工人上均匀分配负载（特别是要模拟的位数）来值。

Helperaccelreportresults（n，rtbaseline，rt，str，tag）;

--------------------------------------------------------------------------------------------------------------------------------------------------------------------- |经过的时间（SEC）|加速度比率1.基线|5.5712 |1.0000 2. MATLAB到C代码生成|1.6952 |3.2864 3. Parallel runs with parfor over Eb/No | 1.4367 | 3.8779 4. Parallel runs with parfor over number of bits | 0.9522 | 5.8507 ----------------------------------------------------------------------------------------------

Summary

You can significantly speed up simulations of your communications algorithms with the combined effects of MATLAB to C code generation and Parallel processing runs.

MATLAB到C代码生成，通过锁定每个变量的数据类型和大小来加速模拟，并减少解释性语言的开销，该开销可以检查代码每行中变量的大小和数据类型。
Parallel processing runs can substantially accelerate simulation by computing different iterations of your algorithm concurrently across a number of MATLAB workers.
Parallelizing each $e_{b} / n_{o}$ point individually can accelerate further by speeding up even the longest running $e_{b} / n_{o}$ point.

这following shows the run time of all four approaches as a bar graph. The results may vary based on the specific algorithm, available workers, and selection of minimum number of errors and maximum number of bits.

结果= Helperaccelreportresults;

该图显示了不同仿真处理方法的BER曲线紧密匹配。每个绘制 $e_{b} / n_{0}$ each of the four versions of the algorithm ran with the maximum number of input bits set to ten million (maxNumBits= 1E7），最小位错误数设置为五千（minNumErr=5000).

Further Exploration

This example uses thegcp功能储备几个MATLAB工人俄文n locally on your MATLAB client machine. By modifying the parallel configurations, you can accelerate the simulation even further by running the algorithm on a larger cluster of workers that are not on your MATLAB client machine. For a description of how to manage and use parallel configurations, see theDiscover Clusters and Use Cluster Profiles(Parallel Computing Toolbox)topic.

这following functions are used in this example.

选定的参考

S. M. Alamouti, "A simple transmit diversity technique for wireless communications,"IEEE® Journal on Selected Areas in Communications，卷。16, no. 8, pp. 1451-1458, Oct. 1998.
V. Tarokh, H. Jafarkhami, and A. R. Calderbank, "Space-time block codes from orthogonal designs,"信息理论的IEEE交易，卷。45，不。5，第1456-1467页，1999年7月。