主要内容

车道检测与GPU编码器优化

这个例子展示了如何从深度学习网络生成CUDA®代码,用a表示SeriesNetwork对象。在这个例子中,串联网络是一个卷积神经网络,可以从图像中检测和输出车道标记边界。

先决条件

  • CUDA支持NVIDIA®GPU。

  • NVIDIA CUDA工具包和驱动程序。

  • NVIDIA cuDNN库。

  • 用于视频读取和图像显示操作的OpenCV库。

  • 编译器和库的环境变量。有关编译器和库的受支持版本的信息,请参见金宝app第三方硬件(GPU编码器).有关设置环境变量,请参见设置必备产品下载188bet金宝搏(GPU编码器)

检查GPU环境

使用coder.checkGpuInstall(GPU编码器)函数验证运行此示例所需的编译器和库是否正确设置。

envCfg = code . gpuenvconfig (“主机”);envCfg。DeepLibTarget =“cudnn”;envCfg。DeepCodegen = 1;envCfg。安静= 1;coder.checkGpuInstall (envCfg);

获得预先训练的系列网络

[laneNet, coeffMeans, coeffStds] = getLaneDetectionNetworkGPU();

该网络将图像作为输入,并输出两个车道边界,分别对应于自我车辆的左右车道。每个车道边界由抛物方程表示: y 一个 x 2 + b x + c ,其中y为横向偏移量,x为与车辆的纵向距离。网络在每个车道上输出三个参数a、b和c。网络架构类似于AlexNet除了最后几层被一个较小的全连接层和回归输出层所取代。要查看网络体系结构,请使用analyzeNetwork函数。

analyzeNetwork (laneNet)

检查主要入口功能

类型detect_lane.m
function [laneFound, ltPts, rtPts] = detect_lane(frame, laneCoeffMeans, laneCoeffStds) %从网络输出中,计算图像中左右车道点%坐标。相机坐标由加州理工学院的单相机模型描述。一个持久化对象mynet用于加载系列网络对象。在%第一次调用此函数时,将构造持久对象,并且% setup。当该函数随后被调用时,相同的对象将被%重用,以便对输入调用predict,从而避免重构和%重新加载网络对象。持久lanenet;if isempty(lanenet) lanenet = code . loaddeeplearningnetwork (' lanenet . loaddeeplearningnetwork ')席”、“lanenet”);end lanecoeffsNetworkOutput = lanenet。预测(permute(frame, [2 1 3]));通过反向归一化步骤恢复原始coeffs params = lanecoeffsNetworkOutput .* laneCoeffStds + laneCoeffMeans;isRightLaneFound = abs(params(6)) > 0.5; %c should be more than 0.5 for it to be a right lane isLeftLaneFound = abs(params(3)) > 0.5; vehicleXPoints = 3:30; %meters, ahead of the sensor ltPts = coder.nullcopy(zeros(28,2,'single')); rtPts = coder.nullcopy(zeros(28,2,'single')); if isRightLaneFound && isLeftLaneFound rtBoundary = params(4:6); rt_y = computeBoundaryModel(rtBoundary, vehicleXPoints); ltBoundary = params(1:3); lt_y = computeBoundaryModel(ltBoundary, vehicleXPoints); % Visualize lane boundaries of the ego vehicle tform = get_tformToImage; % map vehicle to image coordinates ltPts = tform.transformPointsInverse([vehicleXPoints', lt_y']); rtPts = tform.transformPointsInverse([vehicleXPoints', rt_y']); laneFound = true; else laneFound = false; end end function yWorld = computeBoundaryModel(model, xWorld) yWorld = polyval(model, xWorld); end function tform = get_tformToImage % Compute extrinsics based on camera setup yaw = 0; pitch = 14; % pitch of the camera in degrees roll = 0; translation = translationVector(yaw, pitch, roll); rotation = rotationMatrix(yaw, pitch, roll); % Construct a camera matrix focalLength = [309.4362, 344.2161]; principalPoint = [318.9034, 257.5352]; Skew = 0; camMatrix = [rotation; translation] * intrinsicMatrix(focalLength, ... Skew, principalPoint); % Turn camMatrix into 2-D homography tform2D = [camMatrix(1,:); camMatrix(2,:); camMatrix(4,:)]; % drop Z tform = projective2d(tform2D); tform = tform.invert(); end function translation = translationVector(yaw, pitch, roll) SensorLocation = [0 0]; Height = 2.1798; % mounting height in meters from the ground rotationMatrix = (... rotZ(yaw)*... % last rotation rotX(90-pitch)*... rotZ(roll)... % first rotation ); % Adjust for the SensorLocation by adding a translation sl = SensorLocation; translationInWorldUnits = [sl(2), sl(1), Height]; translation = translationInWorldUnits*rotationMatrix; end %------------------------------------------------------------------ % Rotation around X-axis function R = rotX(a) a = deg2rad(a); R = [... 1 0 0; 0 cos(a) -sin(a); 0 sin(a) cos(a)]; end %------------------------------------------------------------------ % Rotation around Y-axis function R = rotY(a) a = deg2rad(a); R = [... cos(a) 0 sin(a); 0 1 0; -sin(a) 0 cos(a)]; end %------------------------------------------------------------------ % Rotation around Z-axis function R = rotZ(a) a = deg2rad(a); R = [... cos(a) -sin(a) 0; sin(a) cos(a) 0; 0 0 1]; end %------------------------------------------------------------------ % Given the Yaw, Pitch, and Roll, determine the appropriate Euler angles % and the sequence in which they are applied to align the camera's % coordinate system with the vehicle coordinate system. The resulting % matrix is a Rotation matrix that together with the Translation vector % defines the extrinsic parameters of the camera. function rotation = rotationMatrix(yaw, pitch, roll) rotation = (... rotY(180)*... % last rotation: point Z up rotZ(-90)*... % X-Y swap rotZ(yaw)*... % point the camera forward rotX(90-pitch)*... % "un-pitch" rotZ(roll)... % 1st rotation: "un-roll" ); end function intrinsicMat = intrinsicMatrix(FocalLength, Skew, PrincipalPoint) intrinsicMat = ... [FocalLength(1) , 0 , 0; ... Skew , FocalLength(2) , 0; ... PrincipalPoint(1), PrincipalPoint(2), 1]; end

生成网络代码和后处理代码

该网络计算参数a、b和c,描述左右车道边界的抛物线方程。

从这些参数中,计算出与车道位置对应的x和y坐标。坐标必须映射到图像坐标。这个函数detect_lane.m执行所有这些计算。的图形处理器代码配置对象,为该函数生成CUDA代码“自由”目标并将目标语言设置为c++。使用编码器。DeepLearningConfig(GPU编码器)函数创建CuDNN深度学习配置对象,并将其分配给DeepLearningConfigGPU代码配置对象的属性。运行codegen命令。

cfg = code . gpuconfig (“自由”);cfg。DeepLearningConfig =编码器。DeepLearningConfig (“cudnn”);cfg。GenerateReport = true;cfg。TargetLang =“c++”;输入= {ones(227,227,3,“单一”), (1 6“双”), (1 6“双”)};codegenarg游戏输入配置cfgdetect_lane
代码生成成功:查看报告

生成代码说明

系列网络生成为包含23层类的数组的c++类。

c_lanenet公众:int32_TbatchSize;int32_TnumLayers;real32_T* inputData;real32_T * outputData;MWCNNLayer*层[23];公众:c_lanenet(无效);无效设置(空白);无效预测(空白);无效的清理(无效);~ c_lanenet(无效);};

设置()方法设置句柄并为每个层对象分配内存。的预测()方法调用对网络中23层中的每一层的预测。

cnn_lanenet_conv*_w和cnn_lanenet_conv*_b文件是网络中卷积层的二进制权值和偏置文件。cnn_lanenet_fc*_w和cnn_lanenet_fc*_b文件是网络中全连接层的二进制权值和偏置文件。

Codegendir = fullfile(“codegen”“自由”“detect_lane”);dir (codegendir)
.MWMaxPoolingLayer。o . .MWNormLayer.cpp .gitignore MWNormLayer.hpp DeepLearningNetwork。铜MWNormLayer。o DeepLearningNetwork.h MWOutputLayer.cpp DeepLearningNetwork.ho MWOutputLayer.hpp MWActivationFunctionType.hpp MWOutputLayer.hpp MWOutputLayer.hppo MWCNNLayer.cpp MWRNNParameterTypes.hpp MWCNNLayer.hpp MWReLULayer.cpp MWCNNLayer。o MWReLULayer.hpp MWCNNLayerImplBase.hpp MWReLULayer.hppo MWCUSOLVERUtils.cpp MWTargetNetworkImplBase.hpp MWCUSOLVERUtils.hpp MWTargetTypes.hpp MWCUSOLVERUtils.cpp MWTargetNetworkImplBase.hpp MWCUSOLVERUtils.hppo MWTensorBase.hpp mwcudadimulti .hpp MWTensorBase.cpp MWCudaMemoryFunctions.hpp MWTensorBase.hpp MWCudnnCNNLayerImpl. hpp MWCudaMemoryFunctions.hpp MWTensorBase.hpp MWCudnnCNNLayerImpl。铜MWTensorBase。MWCudnnCNNLayerImpl.hpp _clang-format MWCudnnCNNLayerImpl. oo buildInfo。MWCudnnCommonHeaders.hpp cnn_lanenet0_0_conv1_b.bin MWCudnnCustomLayerBase. mat MWCudnnCommonHeaders.hppcu cnn_lanenet0_0_conv1_w.bin MWCudnnCustomLayerBase.hpp cnn_lanenet0_0_conv2_b.bin MWCudnnCustomLayerBase.o cnn_lanenet0_0_conv2_w.bin MWCudnnElementwiseAffineLayerImpl.cu cnn_lanenet0_0_conv3_b.bin MWCudnnElementwiseAffineLayerImpl.hpp cnn_lanenet0_0_conv3_w.bin MWCudnnElementwiseAffineLayerImpl.o cnn_lanenet0_0_conv4_b.bin MWCudnnFCLayerImpl.cu cnn_lanenet0_0_conv4_w.bin MWCudnnFCLayerImpl.hpp cnn_lanenet0_0_conv5_b.bin MWCudnnFCLayerImpl.o cnn_lanenet0_0_conv5_w.bin MWCudnnFusedConvActivationLayerImpl.cu cnn_lanenet0_0_data_offset.bin MWCudnnFusedConvActivationLayerImpl.hpp cnn_lanenet0_0_data_scale.bin MWCudnnFusedConvActivationLayerImpl.o cnn_lanenet0_0_fc6_b.bin MWCudnnInputLayerImpl.hpp cnn_lanenet0_0_fc6_w.bin MWCudnnLayerImplFactory.cu cnn_lanenet0_0_fcLane1_b.bin MWCudnnLayerImplFactory.hpp cnn_lanenet0_0_fcLane1_w.bin MWCudnnLayerImplFactory.o cnn_lanenet0_0_fcLane2_b.bin MWCudnnMaxPoolingLayerImpl.cu cnn_lanenet0_0_fcLane2_w.bin MWCudnnMaxPoolingLayerImpl.hpp cnn_lanenet0_0_responseNames.txt MWCudnnMaxPoolingLayerImpl.o codeInfo.mat MWCudnnNormLayerImpl.cu codedescriptor.dmr MWCudnnNormLayerImpl.hpp compileInfo.mat MWCudnnNormLayerImpl.o detect_lane.a MWCudnnOutputLayerImpl.cu detect_lane.cu MWCudnnOutputLayerImpl.hpp detect_lane.h MWCudnnOutputLayerImpl.o detect_lane.o MWCudnnReLULayerImpl.cu detect_lane_data.cu MWCudnnReLULayerImpl.hpp detect_lane_data.h MWCudnnReLULayerImpl.o detect_lane_data.o MWCudnnTargetNetworkImpl.cu detect_lane_initialize.cu MWCudnnTargetNetworkImpl.hpp detect_lane_initialize.h MWCudnnTargetNetworkImpl.o detect_lane_initialize.o MWElementwiseAffineLayer.cpp detect_lane_internal_types.h MWElementwiseAffineLayer.hpp detect_lane_rtw.mk MWElementwiseAffineLayer.o detect_lane_terminate.cu MWElementwiseAffineLayerImplKernel.cu detect_lane_terminate.h MWElementwiseAffineLayerImplKernel.o detect_lane_terminate.o MWFCLayer.cpp detect_lane_types.h MWFCLayer.hpp examples MWFCLayer.o gpu_codegen_info.mat MWFusedConvActivationLayer.cpp html MWFusedConvActivationLayer.hpp interface MWFusedConvActivationLayer.o networkParamsInfo_lanenet0_0.bin MWInputLayer.cpp predict.cu MWInputLayer.hpp predict.h MWInputLayer.o predict.o MWKernelHeaders.hpp rtw_proj.tmw MWLayerImplFactory.hpp rtwtypes.h MWMaxPoolingLayer.cpp shared_layers_export_macros.hpp MWMaxPoolingLayer.hpp

为后续处理输出生成额外的文件

从训练过的网络中导出平均值和标准值,以供执行时使用。

Codegendir = fullfile(pwd,“codegen”“自由”“detect_lane”);Fid = fopen(fullfile(codegendir,“mean.bin”),' w ');A = [coeffMeans coeffStds];写入文件(fid,,“双”);文件关闭(fid);

主文件

使用主文件编译网络代码。主文件使用OpenCVVideoCapture方法从输入视频中读取帧。每个帧都被处理和分类,直到不再读取帧为止。在显示每一帧的输出之前,使用detect_lane生成的函数detect_lane.cu

类型main_lanenet.cu
/*版权2016 The MathWorks, Inc. */ #include  #include  #include  #include  #include  #include  #include  #include  #include  #include  #include "detect_lane.h" using namespace cv;void readData(float *input, Mat& orig, Mat& im) {Size Size (227,227);调整(源自,im,大小,0,0,INTER_LINEAR);(int j = 0; < 227 * 227; j + +) {/ / BGR RGB输入[2 * 227 * 227 + j] =(浮动)(im.data [j * 3 + 0]);输入(1 * 227 * 227 + j] =(浮动)(im.data [j * 3 + 1]);输入[0 * 227 * 227 + j] =(浮动)(im.data [j * 3 + 2]);}} void addLane(float pts[28][2], Mat & im, int numPts) {std::vector iArray;for (int k = 0;k < numPts;k + +) {iArray.push_back (Point2f (pts [k] [0], pts [k] [1])); } Mat curve(iArray, true); curve.convertTo(curve, CV_32S); //adapt type for polylines polylines(im, curve, false, CV_RGB(255,255,0), 2, LINE_AA); } void writeData(float *outputBuffer, Mat & im, int N, double means[6], double stds[6]) { // get lane coordinates boolean_T laneFound = 0; float ltPts[56]; float rtPts[56]; detect_lane(outputBuffer, means, stds, &laneFound, ltPts, rtPts); if (!laneFound) { return; } float ltPtsM[28][2]; float rtPtsM[28][2]; for(int k=0; k<28; k++) { ltPtsM[k][0] = ltPts[k]; ltPtsM[k][1] = ltPts[k+28]; rtPtsM[k][0] = rtPts[k]; rtPtsM[k][1] = rtPts[k+28]; } addLane(ltPtsM, im, 28); addLane(rtPtsM, im, 28); } void readMeanAndStds(const char* filename, double means[6], double stds[6]) { FILE* pFile = fopen(filename, "rb"); if (pFile==NULL) { fputs ("File error",stderr); return; } // obtain file size fseek (pFile , 0 , SEEK_END); long lSize = ftell(pFile); rewind(pFile); double* buffer = (double*)malloc(lSize); size_t result = fread(buffer,sizeof(double),lSize,pFile); if (result*sizeof(double) != lSize) { fputs ("Reading error",stderr); return; } for (int k = 0 ; k < 6; k++) { means[k] = buffer[k]; stds[k] = buffer[k+6]; } free(buffer); } // Main function int main(int argc, char* argv[]) { float *inputBuffer = (float*)calloc(sizeof(float),227*227*3); float *outputBuffer = (float*)calloc(sizeof(float),6); if ((inputBuffer == NULL) || (outputBuffer == NULL)) { printf("ERROR: Input/Output buffers could not be allocated!\n"); exit(-1); } // get ground truth mean and std double means[6]; double stds[6]; readMeanAndStds("mean.bin", means, stds); if (argc < 2) { printf("Pass in input video file name as argument\n"); return -1; } VideoCapture cap(argv[1]); if (!cap.isOpened()) { printf("Could not open the video capture device.\n"); return -1; } cudaEvent_t start, stop; float fps = 0; cudaEventCreate(&start); cudaEventCreate(&stop); Mat orig, im; namedWindow("Lane detection demo",WINDOW_NORMAL); while(true) { cudaEventRecord(start); cap >> orig; if (orig.empty()) break; readData(inputBuffer, orig, im); writeData(inputBuffer, orig, 6, means, stds); cudaEventRecord(stop); cudaEventSynchronize(stop); char strbuf[50]; float milliseconds = -1.0; cudaEventElapsedTime(&milliseconds, start, stop); fps = fps*.9+1000.0/milliseconds*.1; sprintf (strbuf, "%.2f FPS", fps); putText(orig, strbuf, Point(200,30), FONT_HERSHEY_DUPLEX, 1, CV_RGB(0,0,0), 2); imshow("Lane detection demo", orig); if( waitKey(50)%256 == 27 ) break; // stop capturing by pressing ESC */ } destroyWindow("Lane detection demo"); free(inputBuffer); free(outputBuffer); return 0; }

下载示例视频

如果~ (”。/ caltech_cordova1.avi '“文件”) url =“//www.tatmou.com/金宝appsupportfiles/gpucoder/media/caltech_cordova1.avi”;websave (“caltech_cordova1.avi”url);结束

构建可执行

如果ispc setenv (“MATLAB_ROOT”, matlabroot);vcvarsall = mex.getCompilerConfigurations(“c++”) .Details.CommandLineShell;setenv (“VCVARSALL”, vcvarsall);系统(“make_win_lane_detection.bat”);cd (codegendir);系统(“lanenet.exe  ..\..\..\ caltech_cordova1.avi”);其他的setenv (“MATLAB_ROOT”, matlabroot);系统(make -f Makefile_lane_detection.mk);cd (codegendir);系统(”。/ lanenet  ../../../ caltech_cordova1.avi”);结束

输入截图

输出屏幕截图

相关的话题