加速与gpu的相关性

这个示例使用:

打开脚本

这个例子展示了如何使用GPU来加速互相关联。许多相关问题涉及大量数据集，使用GPU可以更快地解决。此示例需要并行计算工具箱™用户许可。指GPU支金宝app持版本（并行计算工具箱）看看支持的GPU。金宝app

介绍

从学习一些关于你机器中GPU的基本信息开始。要访问GPU，请使用并行计算工具箱。

fprintf（“基准GPU-accelerated互相关。\ n”）;如果~ (parallel.gpu.GPUDevice.isAvailable)流(['\ n \ t ** gpu不可用。停止。** \ n']）;返回;其他的dev = gpudevice;fprintf（......'检测到gpu（％s，％d多处理器，计算能力％s）'那......dev.Name、dev.MultiprocessorCount dev.ComputeCapability);结束

基准测试GPU-accelerated互相关。检测到GPU (TITAN Xp, 30多处理器，计算能力6.1)

基准函数

由于为CPU编写的代码可以移植到GPU上运行，因此单个功能可用于基准测试CPU和GPU。但是，由于GPU上的代码与CPU异步执行，因此应在测量性能时采取特殊的预防措施。在测量执行函数的时间之前，请确保通过在设备上执行“等待”方法完成所有GPU处理。此额外呼叫对CPU性能没有影响。

此示例基准三种不同类型的互相关。

基准简单的互相关

对于第一种情况，使用语法交叉相关的两个相同大小的向量xcorr (u, v)．CPU执行时间与GPU执行时间的比值是根据向量的大小绘制的。

fprintf（'\ n \ n ***基准矢量 - 矢量 - 矢量互联*** \ n \ n'）;fprintf（“基准测试函数:\ n”）;类型（“benchXcorrVec”）;fprintf（'\ n \ n'）;尺寸= [2000 1e4 1e5 5e5 1e6];tc = 0(1,元素个数(大小));tg = 0(1,元素个数(大小));numruns = 10;为s = 1:元素个数(大小);fprintf（'运行xcorr of %d elements…\n'、大小(s));delchar = repmat ('\ b'1、numruns);a = rand（尺寸，1）;b =兰特（大小，1）;tc（s）= benchxcorrvec（a，b，numruns）;fprintf（[delchar'\ t \ tcpu时间：％.2f ms \ n'), 1000 * tc (s));tg(s) = benchXcorrVec(gpuArray(a)， gpuArray(b)， numruns);fprintf（[delchar时间:%。2 f \ n '女士), 1000 * tg (s));结束％绘制结果无花果=图;斧头=轴（“父”图);semilogx (ax、大小tc. / tg,'r *  - '）;ylabel（斧头，'加速'）;包含(ax,向量的大小的）;标题(ax,'GPU加速Xcorr'）;粗暴;

***评测向量-向量互相关***评测函数:function t = benchXcorrVec(u,v, numruns) %用于评测xcorr在CPU和GPU上的矢量输入。% Copyright 2012 The MathWorks, Inc. timevec =零(1,numruns);gdev = gpuDevice;For ii=1:numruns ts = tic;o = xcorr (u, v);%#ok wait(gdev) timevec(ii) = toc(ts);流('。');结束t = min(timevec);运行xcorr的2000个元素…CPU时间:0.21 ms GPU时间:4.26 ms CPU time : 1.03 ms GPU time : 4.37 ms Running xcorr of 100000 elements... CPU time : 14.04 ms GPU time : 6.28 ms Running xcorr of 500000 elements... CPU time : 55.98 ms GPU time : 16.09 ms Running xcorr of 1000000 elements... CPU time : 169.00 ms GPU time : 25.60 ms

基准矩阵列互相关

对于第二种情况，矩阵a的列是成对交叉相关的，以使用语法xcorr(a)产生所有相关的大矩阵输出。CPU执行时间与GPU执行时间的比值根据矩阵A的大小绘制。

fprintf（'\n\n ***基准矩阵列互相关*** \n\n'）;fprintf（“基准测试函数:\ n”）;类型（“benchXcorrMatrix”）;fprintf（'\ n \ n'）;size = floor(linspace(0,100, 11));大小(1)= [];tc = 0(1,元素个数(大小));tg = 0(1,元素个数(大小));numruns = 10;为s = 1:元素个数(大小);fprintf（'运行xcorr (matrix)的一个%d x %d矩阵…\n'、大小(s)、大小(s));delchar = repmat ('\ b'1、numruns);a = rand（大小）;tc（s）= benchxcorrmatrix（a，numruns）;fprintf（[delchar'\ t \ tcpu时间：％.2f ms \ n'), 1000 * tc (s));tg(s) = benchXcorrMatrix(gpuArray(a)， numruns);fprintf（[delchar时间:%。2 f \ n '女士), 1000 * tg (s));结束％绘制结果无花果=图;斧头=轴（“父”图);绘图（斧头，大小。^ 2，Tc./tg，'r *  - '）;ylabel（斧头，'加速'）;包含(ax,'矩阵元素'）;标题(ax,XCORR(矩阵)的GPU加速）;粗暴;

***基准测试矩阵柱互联***基准函数：函数t = benchxcorrmatrix（a，numruns）％用于基于CPU和GPU的矩阵输入矩阵。% Copyright 2012 The MathWorks, Inc. timevec =零(1,numruns);gdev = gpuDevice;对于II = 1：numruns，ts = tic;o = Xcorr（a）;%#ok wait(gdev) timevec(ii) = toc(ts);流('。');结束t = min(timevec);结束运行Xcorr（矩阵）的10 x 10矩阵... CPU时间：0.18 ms GPU时间：5.00 ms运行Xcorr（矩阵）的20 x 20矩阵... CPU时间：0.48 ms GPU时间：4.83 ms运行Xcorr（矩阵）为30 x 30矩阵... CPU时间：0.85 ms GPU时间：4.84 MS运行XCorr（矩阵）的40 x 40矩阵... CPU时间：3.38 MS GPU时间：5.57 MS运行XCORR（矩阵）为50 x 50矩阵... CPU时间：5.60 MS GPU时间：5.22 MS运行XCorr（矩阵）的60 x 60矩阵... CPU时间：8.49 MS GPU时间：5.39 MS运行XCorR（矩阵） of a 70 x 70 matrix... CPU time : 20.43 ms GPU time : 5.92 ms Running xcorr (matrix) of a 80 x 80 matrix... CPU time : 26.79 ms GPU time : 6.24 ms Running xcorr (matrix) of a 90 x 90 matrix... CPU time : 40.04 ms GPU time : 6.89 ms Running xcorr (matrix) of a 100 x 100 matrix... CPU time : 49.69 ms GPU time : 7.32 ms

基准二维交叉相关

对于最后一种情况，两个矩阵X和Y使用xcorr2(X,Y)互相关联。X的大小是固定的，而Y可以变化。加速值是根据第二个矩阵的大小绘制的。

fprintf（'\ n \ n ***基准测试2-d交叉相关*** \ n \ n'）;fprintf（“基准测试函数:\ n”）;类型（'benchxcorr2'）;fprintf（'\ n \ n'）;尺寸= [100,200,500,1000,1500,2000];tc = 0(1,元素个数(大小));tg = 0(1,元素个数(大小));numruns = 4;a = rand（100）;为s = 1:元素个数(大小);fprintf（'运行xcorr2的100x100矩阵和%d x %d矩阵…\n'、大小(s)、大小(s));delchar = repmat ('\ b'1、numruns);b =兰德(大小(s));tc(s) = benchXcorr2(a, b, numruns);fprintf（[delchar'\ t \ tcpu时间：％.2f ms \ n'), 1000 * tc (s));tg(s) = benchXcorr2(gpuArray(a)， gpuArray(b)， numruns);fprintf（[delchar时间:%。2 f \ n '女士), 1000 * tg (s));结束％绘制结果无花果=图;ax =轴(“父”图);semilogx (ax,大小。^ 2, tc. / tg,'r *  - '）;ylabel（斧头，'加速'）;包含(ax,'矩阵元素'）;标题(ax,'GPU加速Xcorr2'）;粗暴;fprintf（' \ n \ nBenchmarking完成。\ n \ n”）;

***基准测试2-D互联***基准测试功能：函数t = BENCHXCORR2（x，y，numruns）％用于CPU和GPU上的基于XCORR2。% Copyright 2012 The MathWorks, Inc. timevec =零(1,numruns);gdev = gpuDevice;对于II = 1：numruns，ts = tic;o = Xcorr2（x，y）;%#ok wait(gdev) timevec(ii) = toc(ts);流('。');结束t = min(timevec);结束运行100x100矩阵的XcorR2和100 x 100矩阵... CPU时间：20.35 MS GPU时间：6.96 MS运行100x100矩阵的XcorR2和200 x 200矩阵... CPU时间：42.87 MS GPU时间：11.72毫秒运行Xcorr2的100x100矩阵和500 x 500矩阵... CPU时间：125.23 MS GPU时间：39.67 MS运行Xcorr2的100x100矩阵和1000 x 1000矩阵... CPU时间：386.59 MS GPU时间：88.46 MS运行Xcorr2 a 100x100 matrix and 1500 x 1500 matrix... CPU time : 788.38 ms GPU time : 165.04 ms Running xcorr2 of a 100x100 matrix and 2000 x 2000 matrix... CPU time : 1523.05 ms GPU time : 279.55 ms Benchmarking completed.