gpucoder.stencilKernel

CreateCUDAcode for stencil functions

Syntax

B = gpucoder.stencilKernel(FUN,A,[M N],shape,param1,param2...)

Description

B = gpucoder.stencilKernel(FUN,A,[M N],shape,param1,param2...)applies the functionFUNto each[M,N]sliding window of the inputA. FunctionFUNis called for each[M,N]submatrix ofAand computes an element of outputB. The index of this element corresponds to the center of the[M,N]window.

FUNis the handle to a user-defined function that returns a scalar output of the same type as the input.

C= FUN(X,param1,param2, ...)

Xis the[M,N]submatrix of the original inputA.Xcan be zero-padded when necessary, for instance at the boundaries of inputA.Xand the window can also be 1-D.

Cis a scalar valued output ofFUN. It is the output computed for the center element of the[M,N]arrayXand is assigned to the corresponding element of the output arrayB.

param1,param2are optional arguments. Pass these arguments ifFUNrequires any additional parameters in addition to the input window.

The window[M,N]must be less than or equal to the size ofA, with the same shape asA.

IfAis 1-D row vector, the window must be[1,N].

IfAis 1-D column vector, the window must be[N,1].

shapedetermines the size of the output arrayB. It can have one of three possible values:

'same'- Returns outputBthat is the same size asA.
'full'- (default) Returns the full output. Size ofB> size ofA, that is, ifAis of size (x,y). Size ofB = [x + floor(M/2), y + floor(N/2)]
'valid'- Returns only those parts of the output that are computed without the zero-padded edges ofA. Size ofB = [x - floor(M/2), y - floor(N/2)]

The inputAmust be a vector or matrix with a numeric type supported byFUN. The class ofBis the same as the class ofA.

Code generation is supported only for fixed size outputs. Shape and window must be compile-time constants because they determine the size of the output.

Examples

collapse all

Mean Filter Using Stencil Kernel

This example shows how to use thegpucoder.stencilKerneland generate CUDA^®kernels that perform filtering of an image by using stencil operations.

This example performs mean filtering of a 2-D image. In one file, write the entry-point functiontestthat accepts an image matrixA. Create a subfunctionmy_meanthat computes the mean of the3x3submatrix.

functionB = meanImgFilt(A)%#codegenB = gpucoder.stencilKernel(@my_mean,A,[3 3],'same');functionout = my_mean(A) out = cast(mean(A(:)), class(A));endend

Set up the test input image for themeanImgFiltfunction.

inImage = im2double(imread('cameraman.tif'));

Use thecodegenfunction to generate CUDA MEX function.

codegen-configcoder.gpuConfig('mex')-args{inImage}-reportmeanImgFilt

GPU Coder creates three kernels:meanImgFilt_kernel1for initializing memory,meanImgFilt_kernel2for optimizing the input memory structure, andmeanImgFilt_kernel3for mean filtering operation. The following is a snippet of the generated code.

cudaMalloc (&gpu_B, 524288妳);cudaMalloc (&gpu_A524288ULL); cudaMalloc(&gpu_expanded, 532512ULL); meanImgFilt_kernel1<<>>(gpu_expanded); cudaMemcpy((void *)gpu_A, (void *)&A[0], 524288ULL, cudaMemcpyHostToDevice); meanImgFilt_kernel2<<>>(gpu_A, gpu_expanded); meanImgFilt_kernel3<<>>(gpu_expanded, gpu_B); cudaMemcpy((void *)&B[0], (void *)gpu_B, 524288ULL, cudaMemcpyDeviceToHost);

meanImgFilt_kernel3uses shared memory (__shared__qualifier) to improve memory bandwidth and data locality.

Limitations

For very large input sizes, thegpucoder.stencilKernelfunction may produce CUDA code that does not numerically match the MATLAB^®simulation. In such cases, consider reducing the size of the input to produce accurate results..

Version History

Introduced in R2017b

gpucoder.stencilKernel

Syntax

Description

Examples

Mean Filter Using Stencil Kernel

Limitations

Version History

See Also

Apps

Functions

Objects

Topics