gpucoder.batchedMatrixMultiply

Optimized GPU implementation of batched matrix multiply operation

Syntax

[D1,D2] = gpucoder.batchedMatrixMultiply(A1,B1,A2,B2)

[D1,...,DN] = gpucoder.batchedMatrixMultiply(A1,B1,...,AN,BN)

___= gpucoder.batchedMatrixMultiply(___,Name,Value)

Description

[D1,D2] = gpucoder.batchedMatrixMultiply(A1,B1,A2,B2)performs matrix-matrix multiplication of a batch of matricesA1,B1andA2, B2. Thegpucoder.batchedMatrixMultiplyfunction performs matrix-matrix multiplication of the form:

$D = α A B$

where $α$ is a scalar multiplication factor,A,B, andDare matrices with dimensionsm-by-k,k-by-n, andm-by-nrespectively. You can optionally transpose or hermitian-conjugateAandB. By default, $α$ is set to one and the matrices are not transposed. To specify a different scalar multiplication factor and perform transpose operations on the input matrices, use theName,Valuepair arguments.

All the batches passed to thegpucoder.batchedMatrixMultiplyfunction must be uniform. That is, all instances must have the same dimensionsm,n,k.

[D1,...,DN] = gpucoder.batchedMatrixMultiply(A1,B1,...,AN,BN)performs matrix-matrix multiplication of multipleA,Bpairs of the form:

$D_{i} = α A_{i} B_{i} i = 1 \dots N$

example

___= gpucoder.batchedMatrixMultiply(___,Name,Value)performs batched matrix multiply operation by using the options specified by one or moreName,Valuepair arguments.

Examples

collapse all

Batched Matrix-Matrix Multiplication

Perform a simple batched matrix-matrix multiplication and use thegpucoder.batchedMatrixMultiplyfunction to generate CUDA^®code that calls appropriatecublasgemmBatchedAPIs.

In one file, write an entry-point functionmyBatchMatMulthat accepts matrix inputsA1,B1,A2, andB2. Because the input matrices are not transposed, use the'nn'option.

function[D1,D2] = myBatchMatMul(A1,B1,A2,B2,alpha) [D1,D2] = gpucoder.batchedMatrixMultiply(A1,B1,A2,B2,...'alpha',alpha,'transpose','nn');end

To create a type for a matrix of doubles for use in code generation, use thecoder.newtypefunction.

A1 = coder.newtype('double',[15,42],[0 0]); A2 = coder.newtype('double',[15,42],[0 0]); B1 = coder.newtype('double',[42,30],[0 0]); B2 = coder.newtype('double',[42,30],[0 0]); alpha = 0.3; inputs = {A1,B1,A2,B2,alpha};

To generate a CUDA library, use thecodegenfunction.

cfg = coder.gpuConfig('lib');cfg.GpuConfig。EnableCUBLAS = true; cfg.GpuConfig.EnableCUSOLVER = true; cfg.GenerateReport = true; codegen-configcfg-argsinputsmyBatchMatMul

The generated CUDA code contains kernelsmyBatchMatMul_kernelNNfor initializing the input and output matrices. The code also contains thecublasDgemmBatchedAPI calls to the cuBLAS library. The following code is a snippet of the generated code.

// // File: myBatchMatMul.cu // ... void myBatchMatMul(const double A1[630], const double B1[1260], const double A2 [630], const double B2[1260], double alpha, double D1[450], double D2[450]) { double alpha1; ... myBatchMatMul_kernel1<<>>(*gpu_A2, *gpu_A1, *gpu_input_cell_f2, *gpu_input_cell_f1); cudaMemcpy(gpu_B2, (void *)&B2[0], 10080UL, cudaMemcpyHostToDevice); cudaMemcpy(gpu_B1, (void *)&B1[0], 10080UL, cudaMemcpyHostToDevice); myBatchMatMul_kernel2<<>>(*gpu_B2, *gpu_B1, *gpu_input_cell_f4, *gpu_input_cell_f3); myBatchMatMul_kernel3<<>>(gpu_r3, gpu_r2); myBatchMatMul_kernel4<<>>(gpu_r2, *gpu_out_cell); myBatchMatMul_kernel5<<>>(gpu_r3, *gpu_out_cell); ... cublasDgemmBatched(getCublasGlobalHandle(), CUBLAS_OP_N, CUBLAS_OP_N, 15, 30, 42, (double *)gpu_alpha1, (double **)gpu_Aarray, 15, (double **)gpu_Barray, 42, (double *)gpu_beta1, (double **) gpu_Carray, 15, 2); myBatchMatMul_kernel6<<>>(*gpu_D2, *gpu_out_cell, *gpu_D1); ... }

Input Arguments

collapse all

`A`,`B`—Operands
vectors|matrices

Operands, specified as vectors or matrices.AandBmust be 2-D arrays. The number of columns inAmust be equal to the number of rows inB.

Name-Value Arguments

Specify optional comma-separated pairs ofName,Valuearguments.Nameis the argument name andValueis the corresponding value.Namemust appear inside quotes. You can specify several name and value pair arguments in any order asName1,Value1,...,NameN,ValueN.

Example:[D1,D2] = gpucoder.batchedMatrixMultiply(A1,B1,A2,B2,'alpha',0.3,'transpose','CC');

`alpha`—Scalar multiplication factor
1.0(default) |scalar

Value of the scalar used for multiplication withA. Default value is one.

`transpose`—Operation performed on input matrices
'NN'(default) |character vector|string

Character vector or string composed of two characters, indicating the operation performed on the matricesAandBprior to matrix multiplication. Possible values are normal (“N”), transposed ('T'), or complex conjugate transpose ('C').

Output Arguments

collapse all

`D`— Product
scalar | vector | matrix

Product, returned as a scalar, vector, or matrix. ArrayDhas the same number of rows as inputAand the same number of columns as inputB.

gpucoder.batchedMatrixMultiply

Syntax

Description

Examples

Batched Matrix-Matrix Multiplication

Input Arguments

`A`,`B`—Operands
vectors|matrices

Name-Value Arguments

`alpha`—Scalar multiplication factor
1.0(default) |scalar

`transpose`—Operation performed on input matrices
'NN'(default) |character vector|string

Output Arguments

`D`— Product
scalar | vector | matrix

See Also

Apps

Functions

Objects

Topics

GPU Coder Documentation

金宝app

Generating CUDA Code from MATLAB: Accelerating Embedded Vision and Deep Learning Algorithms on GPUs

gpucoder.batchedMatrixMultiply

Syntax

Description

Examples

Batched Matrix-Matrix Multiplication

Input Arguments

A,B—Operandsvectors|matrices

Name-Value Arguments

alpha—Scalar multiplication factor1.0(default) |scalar

transpose—Operation performed on input matrices'NN'(default) |character vector|string

Output Arguments

D— Productscalar | vector | matrix

See Also

Apps

Functions

Objects

Topics

GPU Coder Documentation

金宝app

Generating CUDA Code from MATLAB: Accelerating Embedded Vision and Deep Learning Algorithms on GPUs

`A`,`B`—Operands
vectors|matrices

`alpha`—Scalar multiplication factor
1.0(default) |scalar

`transpose`—Operation performed on input matrices
'NN'(default) |character vector|string

`D`— Product
scalar | vector | matrix