立体的差距

这个示例使用:

打开生活的脚本

这个例子展示了如何生成一个CUDA®墨西哥人从MATLAB®函数计算两幅图像的立体视差。

第三方的先决条件

要求

这个示例中生成CUDA墨西哥人,有以下第三方的要求。

CUDA启用NVIDIA GPU®和兼容的驱动程序。对于half-precision代码生成,GPU设备必须有一个最低6.0的计算能力。

可选

等non-MEX构建静态、动态库或可执行文件,这个例子有以下额外的需求。

英伟达工具包。
环境变量的编译器和库。有关更多信息,请参见第三方硬件和设置必备产品下载188bet金宝搏。

验证GPU环境

验证所需的编译器和库运行这个示例设置正确,使用coder.checkGpuInstall函数。

envCfg = coder.gpuEnvConfig (“主机”);envCfg。BasicCodegen = 1;envCfg。安静= 1;coder.checkGpuInstall (envCfg);

立体视差计算

的stereoDisparity.m入口点函数接受两个图像,并返回一个立体视差图计算两个图像。

类型stereoDisparity

% %立体视差块匹配算法修改%在这个实现中,而不是寻找改变形象,指数%相应地映射来节省内存和处理。RGBA列主要%包装数据作为输入用于兼容CUDA intrinsic。%使用执行卷积分离过滤器(水平然后%垂直)。% 2017 - 2021版权MathWorks公司函数[out_disp] = stereoDisparity (img0 img1) % # codegen % 2017 - 2019版权MathWorks, Inc . % GPU代码生成编译指示coder.gpu.kernelfun;% %立体声参数差异% | WIN_RAD |操作窗口的半径。| min_disparity | %的最小差距水平继续搜索。| max_disparity | %的最大差距水平继续搜索。WIN_RAD = 8;min_disparity = -16;max_disparity = 0; %% Image Dimensions for Loop Control % The number of channels packed are 4 (RGBA) so as nChannels are 4. [imgHeight,imgWidth]=size(img0); nChannels = 4; imgHeight = imgHeight/nChannels; %% Store the Raw Differences diff_img = zeros([imgHeight+2*WIN_RAD,imgWidth+2*WIN_RAD],'int32'); % Store the minimum cost min_cost = zeros([imgHeight,imgWidth],'int32'); min_cost(:,:) = 99999999; % Store the final disparity out_disp = zeros([imgHeight,imgWidth],'int16'); %% Filters for Aggregating the Differences % |filter_h| is the horizontal filter used in separable convolution. % |filter_v| is the vertical filter used in separable convolution which % operates on the output of the row convolution. filt_h = ones([1 17],'int32'); filt_v = ones([17 1],'int32'); % Main Loop that runs for all the disparity levels. This loop is % expected to run on CPU. for d=min_disparity:max_disparity % Find the difference matrix for the current disparity level. Expect % this to generate a Kernel function. coder.gpu.kernel; for colIdx=1:imgWidth+2*WIN_RAD coder.gpu.kernel; for rowIdx=1:imgHeight+2*WIN_RAD % Row index calculation. ind_h = rowIdx - WIN_RAD; % Column indices calculation for left image. ind_w1 = colIdx - WIN_RAD; % Row indices calculation for right image. ind_w2 = colIdx + d - WIN_RAD; % Border clamping for row Indices. if ind_h <= 0 ind_h = 1; end if ind_h > imgHeight ind_h = imgHeight; end % Border clamping for column indices for left image. if ind_w1 <= 0 ind_w1 = 1; end if ind_w1 > imgWidth ind_w1 = imgWidth; end % Border clamping for column indices for right image. if ind_w2 <= 0 ind_w2 = 1; end if ind_w2 > imgWidth ind_w2 = imgWidth; end % In this step, Sum of absolute Differences is performed % across tour channels. tDiff = int32(0); for chIdx = 1:nChannels tDiff = tDiff + abs(int32(img0((ind_h-1)*(nChannels)+... chIdx,ind_w1))-int32(img1((ind_h-1)*(nChannels)+... chIdx,ind_w2))); end % Store the SAD cost into a matrix. diff_img(rowIdx,colIdx) = tDiff; end end % Aggregating the differences using separable convolution. Expect this % to generate two kernels using shared memory.The first kernel is the % convolution with the horizontal kernel and second kernel operates on % its output the column wise convolution. cost_v = conv2(diff_img,filt_h,'valid'); cost = conv2(cost_v,filt_v,'valid'); % This part updates the min_cost matrix with by comparing the values % with current disparity level. for ll=1:imgWidth for kk=1:imgHeight % load the cost temp_cost = int32(cost(kk,ll)); % Compare against the minimum cost available and store the % disparity value. if min_cost(kk,ll) > temp_cost min_cost(kk,ll) = temp_cost; out_disp(kk,ll) = abs(d) + 8; end end end end end

包装图片和包装数据读入RGBA列为主的秩序

img0 = imread (“scene_left.png”);img1 = imread (“scene_right.png”);[imgRGB0] = pack_rgbData (img0);[imgRGB1] = pack_rgbData (img1);

左图

正确的图片

生成GPU的代码

cfg = coder.gpuConfig (墨西哥人的);codegen配置cfgarg游戏{imgRGB0, imgRGB1}stereoDisparity;

代码生成成功:查看报告

运行生成的墨西哥人,并显示输出差异

out_disp = stereoDisparity_mex (imgRGB0 imgRGB1);显示亮度图像(out_disp);

半精密

计算在这个例子也可以在half-precision浮点数,完成使用stereoDisparityHalfPrecision.m入口点函数。与half-precision数据类型生成和执行代码,CUDA 6.0或更高的计算能力是必需的。设置ComputeCapability代码配置对象的属性“6.0”。half-precision,内存分配(malloc)模式生成CUDA代码必须设置为“离散”。

cfg.GpuConfig。ComputeCapability =“6.0”;cfg.GpuConfig。MallocMode =“离散”;

标准的imread命令与整数代表图像的RGB通道,每个像素一个。从0到255的整数范围。简单的铸件输入类型可能导致溢出在旋转的一半。在这种情况下,我们可以扩展图像的值在0和1之间。“imread”表示图像的RGB通道与整数,每个像素一个。从0到255的整数范围。简单的铸件输入类型可能导致溢出在旋转的一半。在这种情况下,我们可以扩展图像的值在0和1之间。

img0 = imread (“scene_left.png”);img1 = imread (“scene_right.png”);[imgRGB0] =一半(pack_rgbData (img0)) / 255;[imgRGB1] =一半(pack_rgbData (img1)) / 255;

生成函数的CUDA墨西哥人

代码生成的stereo_disparity_half_precision.m函数。

codegen配置cfgarg游戏{imgRGB0, imgRGB1}stereoDisparityHalfPrecision;

代码生成成功:查看报告

另请参阅

功能

codegen|coder.gpu.kernel|coder.gpu.kernelfun|gpucoder.matrixMatrixKernel|coder.gpu.constantMemory|gpucoder.stencilKernel|coder.checkGpuInstall

对象

coder.gpuConfig|coder.CodeConfig|coder.EmbeddedCodeConfig|coder.gpuEnvConfig