GPU编码器により最適化された車線検出

この例では次を使用します:

この例では，SeriesNetworkオブジェクトによって表された深層学習ネットワークからCUDA®コードを生成する方法を説明します。この例では，系列ネットワークはイメージから車線マーカーの境界を検出して出力できる畳み込みニューラルネットワークです。

必要条件

Cuda対応nvidia®gpu。
Nvidia cudaルキットおよびドラバ。
NVIDIA cuDNNラaaplブラリ。
OpenCVラaapl . cerブラリ(ビデオ読み取りとaapl . cerメ. aapl . cerジ表示操作用)。
コンパ▪▪ラおよびラ▪▪ブラリの環境変数。サポトされているコンパイラおよびライブラリのバージョンの詳細は、サ，ドパ，ティハ，ドウェアを参照してください。環境変数の設定は，前提条件となる製品の設定を参照してください。

Gpu環境の検証

関数coder.checkGpuInstallを使用して,この例を実行するのに必要なコンパイラおよびライブラリが正しく設定されていることを検証します。

envCfg = code . gpuenvconfig (“主机”）;envCfg。DeepLibTarget =“cudnn”；envCfg。DeepCodegen = 1;envCfg。安静= 1;coder.checkGpuInstall (envCfg);

事前学習済みのSeriesNetworkの取得

[laneNet, coeffMeans, coeffStds] = getLaneDetectionNetworkGPU();

このネットワークは入力としてイメージを取り,自車の左右の車線に対応する2つの車線境界線を出力します。各車線境界線は，放物線方程式によって表すことができます。ここでyは横方向オフセットxは車両からの縦方向の距離です。このネットワ，クは，車線ごとに3のパラメ，タ，a, b, cを出力します。ネットワ，クア，キテクチャはAlexNetに似ていますが，最後の数層は，規模の小さい全結合層と回帰出力層に置き換えられています。

laneNet。层

ans = 23×1带有图层的图层数组:1的数据图像输入227×227×3图片2的zerocenter正常化conv1卷积96年11×11×3旋转步[4 4]和填充[0 0 0 0]3‘relu1 ReLU ReLU 4 norm1的横通道正常化横通道正常化与5频道/元素5“pool1”马克斯池3×3马克斯池步(2 - 2)和填充[0 0 0 0]6 conv2卷积256 5×5×48旋转步[1]和填充(2 2 2 2)7的relu2 ReLU ReLU 8 norm2横通道正常化横渠道规范化与5频道/元素9“pool2”马克斯池3×3马克斯池步(2 - 2)和填充[0 0 0 0]10 conv3卷积384 3×3×256旋转步[1]和填充[1 1 1 1]11的relu3 ReLU ReLU 12 conv4卷积384 3×3×192旋转步[1]和填充[1 1 1 1]13的relu4 ReLU ReLU 14 conv5卷积256 3×3×192旋转步[1]和填充[1 1 1 1]15 ' relu5 ReLU ReLU 16“pool5”马克斯池3×3马克斯池stride[2 2]和padding [0 0 0 0] 17 'fc6'全连接4096全连接层18 'relu6' ReLU ReLU 19 'drop6' Dropout 50% Dropout 20 ' fclone1 '全连接16全连接层21 ' fclone1relu ' ReLU ReLU 22 ' fclone2 '全连接6全连接层23 'output'回归输出均方错误与'leftLane_a'， 'leftLane_b'，和4个其他响应

メ▪▪ンエントリポ▪▪ント関数の確認

类型detect_lane.m

function [laneFound, ltPts, rtPts] = detect_lane(frame, laneCoeffMeans, laneCoeffStds) %从网络输出中，计算%图像坐标中的左右车道点。摄像机坐标由caltech %单摄像机模型描述。一个持久化对象mynet用于加载系列网络对象。在第一次调用此函数时，将构造持久对象，并且% setup。在以后多次调用该函数时，将重用相同的对象%，以对输入调用predict，从而避免重构和重新加载% network对象。持久lanenet;if isempty(lanenet) lanenet = code . loaddeeplearningnetwork (' lanenet . loaddeeplearningnetwork ')席”、“lanenet”);end lanecoeffsNetworkOutput = lanenet。预测(permute(frame， [2 1 3]));通过反向归一化步骤恢复原始coeffs params = lanecoeffsNetworkOutput .* laneCoeffStds + laneCoeffMeans;isRightLaneFound = abs(params(6)) > 0.5; %c should be more than 0.5 for it to be a right lane isLeftLaneFound = abs(params(3)) > 0.5; vehicleXPoints = 3:30; %meters, ahead of the sensor ltPts = coder.nullcopy(zeros(28,2,'single')); rtPts = coder.nullcopy(zeros(28,2,'single')); if isRightLaneFound && isLeftLaneFound rtBoundary = params(4:6); rt_y = computeBoundaryModel(rtBoundary, vehicleXPoints); ltBoundary = params(1:3); lt_y = computeBoundaryModel(ltBoundary, vehicleXPoints); % Visualize lane boundaries of the ego vehicle tform = get_tformToImage; % map vehicle to image coordinates ltPts = tform.transformPointsInverse([vehicleXPoints', lt_y']); rtPts = tform.transformPointsInverse([vehicleXPoints', rt_y']); laneFound = true; else laneFound = false; end end function yWorld = computeBoundaryModel(model, xWorld) yWorld = polyval(model, xWorld); end function tform = get_tformToImage % Compute extrinsics based on camera setup yaw = 0; pitch = 14; % pitch of the camera in degrees roll = 0; translation = translationVector(yaw, pitch, roll); rotation = rotationMatrix(yaw, pitch, roll); % Construct a camera matrix focalLength = [309.4362, 344.2161]; principalPoint = [318.9034, 257.5352]; Skew = 0; camMatrix = [rotation; translation] * intrinsicMatrix(focalLength, ... Skew, principalPoint); % Turn camMatrix into 2-D homography tform2D = [camMatrix(1,:); camMatrix(2,:); camMatrix(4,:)]; % drop Z tform = projective2d(tform2D); tform = tform.invert(); end function translation = translationVector(yaw, pitch, roll) SensorLocation = [0 0]; Height = 2.1798; % mounting height in meters from the ground rotationMatrix = (... rotZ(yaw)*... % last rotation rotX(90-pitch)*... rotZ(roll)... % first rotation ); % Adjust for the SensorLocation by adding a translation sl = SensorLocation; translationInWorldUnits = [sl(2), sl(1), Height]; translation = translationInWorldUnits*rotationMatrix; end %------------------------------------------------------------------ % Rotation around X-axis function R = rotX(a) a = deg2rad(a); R = [... 1 0 0; 0 cos(a) -sin(a); 0 sin(a) cos(a)]; end %------------------------------------------------------------------ % Rotation around Y-axis function R = rotY(a) a = deg2rad(a); R = [... cos(a) 0 sin(a); 0 1 0; -sin(a) 0 cos(a)]; end %------------------------------------------------------------------ % Rotation around Z-axis function R = rotZ(a) a = deg2rad(a); R = [... cos(a) -sin(a) 0; sin(a) cos(a) 0; 0 0 1]; end %------------------------------------------------------------------ % Given the Yaw, Pitch, and Roll, determine the appropriate Euler % angles and the sequence in which they are applied to % align the camera's coordinate system with the vehicle coordinate % system. The resulting matrix is a Rotation matrix that together % with the Translation vector defines the extrinsic parameters of the camera. function rotation = rotationMatrix(yaw, pitch, roll) rotation = (... rotY(180)*... % last rotation: point Z up rotZ(-90)*... % X-Y swap rotZ(yaw)*... % point the camera forward rotX(90-pitch)*... % "un-pitch" rotZ(roll)... % 1st rotation: "un-roll" ); end function intrinsicMat = intrinsicMatrix(FocalLength, Skew, PrincipalPoint) intrinsicMat = ... [FocalLength(1) , 0 , 0; ... Skew , FocalLength(2) , 0; ... PrincipalPoint(1), PrincipalPoint(2), 1]; end

ネットワ，ク用コ，ドと後処理コ，ドの生成

このネットワークは,左右の車線境界線の放物線方程式を記述するパラメーターa, b, cを計算します。

これらのパラメ，タ，から，車線の位置に対応するx座標とy座標を計算します。これらの座標を▪▪メ▪▪ジ座標にマッピングしなければなりません。関数detect_lane.mで，これらすべての計算を行います。“自由”ターゲットのGPUコード構成オブジェクトを作成することによってこの関数のCUDAコードを生成し,ターゲット言語をc++に設定します。関数编码器。DeepLearningConfigを使用してCuDNN深層学習構成オブジェクトを作成し，それをgpuコ，ド構成オブジェクトのDeepLearningConfigプロパティに割り当てます。codegenコマンドを実行します。

cfg = code . gpuconfig (“自由”）;cfg。DeepLearningConfig =编码器。DeepLearningConfig (“cudnn”）;cfg。GenerateReport = true;cfg。TargetLang =“c++”；codegenarg游戏{的(227227 3,“单”),则(1 6“双”)的(1 6双)}配置cfgdetect_lane

要查看报告，打开('codegen/lib/detect_lane/html/report.mldatx')。

生成されたコ，ドの説明

系列ネットワ，クは，23個の層クラスから成る配列を含むc++クラスとして生成されます。

类c_lanenet｛公众:int32_TbatchSize；int32_TnumLayers；real32_T* inputData；real32_T * outputData;MWCNNLayer*层[23]；公众:c_lanenet(无效);无效设置(空白)；无效预测(空白)；无效的清理(无效);~ c_lanenet(无效);}；

このクラスの设置()メソッドは，ハンドルを設定し，各層オブジェクトにメモリを割り当てます。预测()メソッドは，ネットワク内の23個の層それぞれにいて予測を呼び出します。

cnn_lanenet_conv * _wおよびcnn_lanenet_conv * _bファイルは,ネットワーク内の畳み込み層のバイナリ重みおよびバイアスファイルです。cnn_lanenet_fc * _wおよびcnn_lanenet_fc * _bファイルは,ネットワーク内の全結合層のバイナリ重みおよびバイアスファイルです。

Codegendir = fullfile(“codegen”，“自由”，“detect_lane”）;dir (codegendir)

．cnn_lanenet0_0_conv4_w.bin . .cnn_lanenet0_0_conv5_w.bin DeepLearningNetwork. bin .gitignorecu cnn_lanenet0_0_data_offset.bin DeepLearningNetwork.h cnn_lanenet0_0_data_scale.bin DeepLearningNetwork.ho cnn_lanenet0_0_fc6_b.bin MWCNNLayerImpl. ocu cnn_lanenet0_0_fc6_w.bin mwcnnnlayerimpl .hpp cnn_lanenet0_0_fcline1_b .bin MWCNNLayerImpl.hppMWCudaDimUtility. o cnn_lanenet0_0_fcLane1_w.bincnn_lanenet0_0_fcline2_w .bin mwcudadimultiity .hpp cnn_lanenet0_0_fcline2_w .bin MWCustomLayerForCuDNN.cpp cnn_lanenet0_0_responsennames .txt MWCustomLayerForCuDNN.hpp codeInfo. txt MWCustomLayerForCuDNN.hpp垫MWCustomLayerForCuDNN。o codedescriptor。dmr MWElementwiseAffineLayer.cpp compileInfo。hpp definitions .txt MWElementwiseAffineLayer. mat MWElementwiseAffineLayer.hpp definitions .txt MWElementwiseAffineLayer. txto detect_lane。MWElementwiseAffineLayerImpl。铜detect_lane。cu MWElementwiseAffineLayerImpl.hpp detect_lane.h MWElementwiseAffineLayerImpl。o detect_lane。o MWElementwiseAffineLayerImplKernel.cu detect_lane_data.cu MWElementwiseAffineLayerImplKernel.o detect_lane_data.h MWFusedConvReLULayer.cpp detect_lane_data.o MWFusedConvReLULayer.hpp detect_lane_initialize.cu MWFusedConvReLULayer.o detect_lane_initialize.h MWFusedConvReLULayerImpl.cu detect_lane_initialize.o MWFusedConvReLULayerImpl.hpp detect_lane_ref.rsp MWFusedConvReLULayerImpl.o detect_lane_rtw.mk MWKernelHeaders.hpp detect_lane_terminate.cu MWTargetNetworkImpl.cu detect_lane_terminate.h MWTargetNetworkImpl.hpp detect_lane_terminate.o MWTargetNetworkImpl.o detect_lane_types.h buildInfo.mat examples cnn_api.cpp gpu_codegen_info.mat cnn_api.hpp html cnn_api.o interface cnn_lanenet0_0_conv1_b.bin mean.bin cnn_lanenet0_0_conv1_w.bin predict.cu cnn_lanenet0_0_conv2_b.bin predict.h cnn_lanenet0_0_conv2_w.bin predict.o cnn_lanenet0_0_conv3_b.bin rtw_proj.tmw cnn_lanenet0_0_conv3_w.bin rtwtypes.h cnn_lanenet0_0_conv4_b.bin

出力の後処理用の追加ファ@ @ルの生成

実行中に使用する学習済みネットワ，クから平均値と標準偏差値をエクスポ，トします。

Codegendir = fullfile(pwd，“codegen”，“自由”，“detect_lane”）;Fid = fopen(fullfile(codegendir，“mean.bin”)，' w '）;A = [coeffMeans coeffStds];写入文件(fid,,“双”）;文件关闭(fid);

メ▪▪ンファ▪▪ル

メンファルを使用してネットワドをコンパルします。メ▪▪ンファ▪▪ルは，OpenCVのVideoCaptureメソッドを使用して，入力ビデオからフレ，ムを読み取ります。読み取るフレ，ムがなくなるまで，各フレ，ムが処理されて分類されます。各フレ，ムの出力を表示する前に，出力は，detect_lane.cuで生成される関数detect_laneを使用して後処理されます。

类型main_lanenet.cu

/*版权2016 The MathWorks, Inc. */ #include  #include  #include  #include  #include  #include  #include  #include  #include  #include  #include "detect_lane.h" using namespace cv;void readData(float *input, Mat& orig, Mat& im) {Size Size (227,227);调整(源自,im,大小,0,0,INTER_LINEAR);(int j = 0; < 227 * 227; j + +) {/ / BGR RGB输入[2 * 227 * 227 + j] =(浮动)(im.data [j * 3 + 0]);输入(1 * 227 * 227 + j] =(浮动)(im.data [j * 3 + 1]);输入[0 * 227 * 227 + j] =(浮动)(im.data [j * 3 + 2]);}} void addLane(float pts[28][2]， Mat & im, int numPts) {std::vector iArray;for (int k = 0;k < numPts;k + +) {iArray.push_back (Point2f (pts [k] [0], pts [k] [1])); } Mat curve(iArray, true); curve.convertTo(curve, CV_32S); //adapt type for polylines polylines(im, curve, false, CV_RGB(255,255,0), 2, LINE_AA); } void writeData(float *outputBuffer, Mat & im, int N, double means[6], double stds[6]) { // get lane coordinates boolean_T laneFound = 0; float ltPts[56]; float rtPts[56]; detect_lane(outputBuffer, means, stds, &laneFound, ltPts, rtPts); if (!laneFound) { return; } float ltPtsM[28][2]; float rtPtsM[28][2]; for(int k=0; k<28; k++) { ltPtsM[k][0] = ltPts[k]; ltPtsM[k][1] = ltPts[k+28]; rtPtsM[k][0] = rtPts[k]; rtPtsM[k][1] = rtPts[k+28]; } addLane(ltPtsM, im, 28); addLane(rtPtsM, im, 28); } void readMeanAndStds(const char* filename, double means[6], double stds[6]) { FILE* pFile = fopen(filename, "rb"); if (pFile==NULL) { fputs ("File error",stderr); return; } // obtain file size fseek (pFile , 0 , SEEK_END); long lSize = ftell(pFile); rewind(pFile); double* buffer = (double*)malloc(lSize); size_t result = fread(buffer,sizeof(double),lSize,pFile); if (result*sizeof(double) != lSize) { fputs ("Reading error",stderr); return; } for (int k = 0 ; k < 6; k++) { means[k] = buffer[k]; stds[k] = buffer[k+6]; } free(buffer); } // Main function int main(int argc, char* argv[]) { float *inputBuffer = (float*)calloc(sizeof(float),227*227*3); float *outputBuffer = (float*)calloc(sizeof(float),6); if ((inputBuffer == NULL) || (outputBuffer == NULL)) { printf("ERROR: Input/Output buffers could not be allocated!\n"); exit(-1); } // get ground truth mean and std double means[6]; double stds[6]; readMeanAndStds("mean.bin", means, stds); if (argc < 2) { printf("Pass in input video file name as argument\n"); return -1; } VideoCapture cap(argv[1]); if (!cap.isOpened()) { printf("Could not open the video capture device.\n"); return -1; } cudaEvent_t start, stop; float fps = 0; cudaEventCreate(&start); cudaEventCreate(&stop); Mat orig, im; namedWindow("Lane detection demo",WINDOW_NORMAL); while(true) { cudaEventRecord(start); cap >> orig; if (orig.empty()) break; readData(inputBuffer, orig, im); writeData(inputBuffer, orig, 6, means, stds); cudaEventRecord(stop); cudaEventSynchronize(stop); char strbuf[50]; float milliseconds = -1.0; cudaEventElapsedTime(&milliseconds, start, stop); fps = fps*.9+1000.0/milliseconds*.1; sprintf (strbuf, "%.2f FPS", fps); putText(orig, strbuf, Point(200,30), FONT_HERSHEY_DUPLEX, 1, CV_RGB(0,0,0), 2); imshow("Lane detection demo", orig); if( waitKey(50)%256 == 27 ) break; // stop capturing by pressing ESC */ } destroyWindow("Lane detection demo"); free(inputBuffer); free(outputBuffer); return 0; }

サンプルビデオのダウンロ，ド

如果~ (”。/ caltech_cordova1.avi '，“文件”) url =“//www.tatmou.com/金宝appsupportfiles/gpucoder/media/caltech_cordova1.avi”；websave (“caltech_cordova1.avi”url);结束

実行可能ファ@ @ルのビルド

如果ispc setenv (“MATLAB_ROOT”, matlabroot);vcvarsall = mex.getCompilerConfigurations(“c++”) .Details.CommandLineShell;setenv (“VCVARSALL”, vcvarsall);系统(“make_win_lane_detection.bat”）;cd (codegendir);系统(“lanenet.exe  ..\..\..\ caltech_cordova1.avi”）;其他的setenv (“MATLAB_ROOT”, matlabroot);系统(make -f Makefile_lane_detection.mk）;cd (codegendir);系统(”。/ lanenet  ../../../ caltech_cordova1.avi”）;结束