Main Content

Signal Processing Acceleration Through Code Generation

Note

The benchmarks in this example have been measured on a machine with four physical cores.

This example shows how to accelerate a signal processing algorithm in MATLAB®using thecodegen(MATLAB Coder)anddspunfoldfunctions. You can generate a MATLAB executable (MEX function) from an entire MATLAB function or specific parts of the MATLAB function. When you run the MEX function instead of the original MATLAB code, simulation speed can increase significantly. To generate the MEX equivalent, the algorithm must support code generation.

To usecodegen(MATLAB Coder), you must haveMATLAB Coder™installed. To usedspunfold, you must haveMATLAB Coderand DSP System Toolbox™ installed.

To usedspunfoldon Windows®and Linux®, you must use a compiler that supports the Open Multi-Processing (OpenMP) application interface. SeeSupported Compilers.

FIR Filter Algorithm

Consider a simple FIR filter algorithm to accelerate. Copy thefirfilterfunction code into thefirfilter.mfile.

function[y,z1] = firfilter(b,x)% Inputs:% b - 1xNTaps row vector of coefficients% x - A frame of noisy input% States:% z, z1 - NTapsx1 column vector of states% Output:% y - A frame of filtered outputpersistentz;if(isempty(z)) z = zeros(length(b),1);endLx = size(x,1); y = zeros(size(x),'like',x); z1 = z;form = 1:Lx% Load next input samplez1(1,:) = x(m,:);% Compute outputy(m,:) = b*z1;% Update statesz1(2:end,:) = z1(1:end-1,:); z = z1;end

Thefirfilterfunction accepts a vector of filter coefficients,b, a noisy input signal,x, as inputs. Generate the filter coefficients using thefir1function.

NTaps = 250; Fp = 4e3/(44.1e3/2); b = fir1(NTaps-1,Fp);

过滤流usi的嘈杂的正弦波信号ng thefirfilterfunction. The sine wave has a frame size of 4000 samples and a sample rate of 192 kHz. Generate the sine wave using thedsp.SineWaveSystem object™. The noise is a white Gaussian with a mean of 0 and a variance of 0.02. Name this functionfirfilter_sim. Thefirfilter_simfunction calls thefirfilterfunction on the noisy input.

functiontotVal = firfilter_sim(b)% Create the signal sourceSig = dsp.SineWave('SamplesPerFrame',4000,'SampleRate',19200); totVal = zeros(4000,500); R = 0.02; clearfirfilter;% Iteration loop. Each iteration filters a frame of the noisy signal.fori = 1 : 500 trueVal = Sig();% Original sine wavenoisyVal = trueVal + sqrt(R)*randn;% Noisy sine wavefilteredVal = firfilter(b,noisyVal);% Filtered sine wavetotVal(:,i) = filteredVal;% Store the entire sine waveend

Runfirfilter_simand measure the speed of execution. The execution speed varies depending on your machine.

tic;totVal = firfilter_sim(b);t1 = toc; fprintf('Original Algorithm Simulation Time: %4.1f seconds\n',t1);
Original Algorithm Simulation Time: 7.8 seconds

Accelerate the FIR Filter Usingcodegen

Callcodegenonfirfilter, and generate its MEX equivalent,firfilter_mex. Generate and pass the filter coefficients and the sine wave signal as inputs to thefirfilterfunction.

Ntaps = 250; Sig = dsp.SineWave('SamplesPerFrame',4000,'SampleRate',19200);% Create the Signal SourceR = 0.02; trueVal = Sig();% Original sine wavenoisyVal = trueVal + sqrt(R)*randn;% Noisy sine waveFp = 4e3/(44.1e3/2); b = fir1(Ntaps-1,Fp);% Filter coefficientscodegenfirfilter-args{b,noisyVal}

In thefirfilter_simfunction, replacefirfilter(b,noisyVal)function call withfirfilter_mex(b,noisyVal). Name this functionfirfilter_codegen.

functiontotVal = firfilter_codegen(b)% Create the signal sourceSig = dsp.SineWave('SamplesPerFrame',4000,'SampleRate',19200); totVal = zeros(4000,500); R = 0.02; clearfirfilter_mex;% Iteration loop. Each iteration filters a frame of the noisy signal.fori = 1 : 500 trueVal = Sig();% Original sine wavenoisyVal = trueVal + sqrt(R)*randn;% Noisy sine wavefilteredVal = firfilter_mex(b,noisyVal);% Filtered sine wavetotVal(:,i) = filteredVal;% Store the entire sine waveend

Runfirfilter_codegenand measure the speed of execution. The execution speed varies depending on your machine.

tic;totValcodegen = firfilter_codegen(b);t2 = toc; fprintf('Algorithm Simulation Time with codegen: %5f seconds\n',t2); fprintf('Speedup factor with codegen: %5f\n',(t1/t2));
Algorithm Simulation Time with codegen: 0.923683 seconds Speedup factor with codegen: 8.5531

The speedup gain is approximately8.5.

Accelerate the FIR Filter Usingdspunfold

Thedspunfoldfunction generates a multithreaded MEX file which can improve the speedup gain even further.

dspunfoldalso generates a single-threaded MEX file and a self-diagnostic analyzer function. The multithreaded MEX file leverages the multicore CPU architecture of the host computer. The single-threaded MEX file is similar to the MEX file that thecodegenfunction generates. The analyzer function measures the speedup gain of the multithreaded MEX file over the single-threaded MEX file.

Calldspunfoldonfirfilterand generate its multithreaded MEX equivalent,firfilter_mt. Detect the state length in samples by using the-foption, which can improve the speedup gain further.-s auto触发自动售货机ic state length detection. For more information on using the-fand-soptions, seedspunfold.

dspunfoldfirfilter-args{b,noisyVal}-sauto-f[false,true]
State length: [autodetect] samples, Repetition: 1, Output latency: 8 frames, Threads: 4 Analyzing: firfilter.m Creating single-threaded MEX file: firfilter_st.mexw64 Searching for minimal state length (this might take a while) Checking stateless ... Insufficient Checking 4000 samples ... Sufficient Checking 2000 samples ... Sufficient Checking 1000 samples ... Sufficient Checking 500 samples ... Sufficient Checking 250 samples ... Sufficient Checking 125 samples ... Insufficient Checking 187 samples ... Insufficient Checking 218 samples ... Insufficient Checking 234 samples ... Insufficient Checking 242 samples ... Insufficient Checking 246 samples ... Insufficient Checking 248 samples ... Insufficient Checking 249 samples ... Sufficient Minimal state length is 249 samples Creating multi-threaded MEX file: firfilter_mt.mexw64 Creating analyzer file: firfilter_analyzer.p

The automatic state length detection tool detects an exact state length of259samples.

Call the analyzer function and measure the speedup gain of the multithreaded MEX file with respect to the single-threaded MEX file. Provide at least two different frames for each input argument of the analyzer. The frames are appended along the first dimension. The analyzer alternates between these frames while verifying that the outputs match. Failure to provide multiple frames for each input can decrease the effectiveness of the analyzer and can lead to false positive verification results.

firfilter_analyzer([b;0.5*b;0.6*b],[noisyVal;0.5*noisyVal;0.6*noisyVal]);
分析多线程文件firfilter_mt.mex墨西哥人w64. For best results, please refrain from interacting with the computer and stop other processes until the analyzer is done. Latency = 8 frames Speedup = 3.2x

firfilter_mthas a speedup gain factor of3.2when compared to the single-threaded MEX file,firfilter_st. To increase the speedup further, increase the repetition factor using the-roption. The tradeoff is that the output latency increases. Use a repetition factor of3. Specify the exact state length to reduce the overhead and increase the speedup further.

dspunfoldfirfilter-args{b,noisyVal}-s249-f[false,true]-r3
State length: 249 samples, Repetition: 3, Output latency: 24 frames, Threads: 4 Analyzing: firfilter.m Creating single-threaded MEX file: firfilter_st.mexw64 Creating multi-threaded MEX file: firfilter_mt.mexw64 Creating analyzer file: firfilter_analyzer.p

Call the analyzer function.

firfilter_analyzer([b;0.5*b;0.6*b],[noisyVal;0.5*noisyVal;0.6*noisyVal]);
分析多线程文件firfilter_mt.mex墨西哥人w64. For best results, please refrain from interacting with the computer and stop other processes until the analyzer is done. Latency = 24 frames Speedup = 3.8x

The speedup gain factor is3.8, or approximately 32 times the speed of execution of the original simulation.

For this particular algorithm, you can see thatdspunfoldis generating a highly optimized code, without having to write any C or C++ code. The speedup gain scales with the number of cores on your host machine.

The FIR filter function in this example is only an illustrative algorithm that is easy to understand. You can apply this workflow on any of your custom algorithms. If you want to use an FIR filter, it is recommended that you use thedsp.FIRFilterSystem object in DSP System Toolbox. This object runs much faster than the benchmark numbers presented in this example, without the need for code generation.

相关的话题