主要内容

Train a Deep Learning Vehicle Detector

This example shows how to train a vision-based vehicle detector using deep learning.

Overview

Vehicle detection using computer vision is an important component for tracking vehicles around the ego vehicle. The ability to detect and track vehicles is required for many autonomous driving applications, such as for forward collision warning, adaptive cruise control, and automated lane keeping. Automated Driving Toolbox™ provides pretrained vehicle detectors (vehicleDetectorFasterRCNNand车辆) to enable quick prototyping. However, the pretrained models might not suit every application, requiring you to train from scratch. This example shows how to train a vehicle detector from scratch using deep learning.

深度学习是一种强大的机器学习技术,您可以用来培训强大的对象探测器。存在几种对象检测的深度学习技术,包括更快的R-CNN,您只需看一次(YOLO)V2。此示例使用速度培训更快的R-CNN车辆探测器trainFasterRCNNObjectDetector功能。For more information, seeObject Detection

下载预用探测器

Download a pretrained detector to avoid having to wait for training to complete. If you want to train the detector, set thedoTrainingAndEval变量为真。

dotrainingandeval = false;if~doTrainingAndEval && ~exist('fasterRCNNResNet50EndToEndVehicleExample.mat','file') disp('Downloading pretrained detector (118 MB)...'); pretrainedURL ='//www.tatmou.com/金宝appsupportfiles/vision/data/fasterrcnnresnet50endtoendvehicleexample.mat'; websave('fasterRCNNResNet50EndToEndVehicleExample.mat',pretrainedURL);end

Load Dataset

This example uses a small labeled dataset that contains 295 images. Many of these images come from the Caltech Cars 1999 and 2001 data sets, available at the Caltech Computational Visionwebsite,由Pietro Perona创建并与许可使用。每个图像包含车辆的一个或两个标记的实例。一个小型数据集对于探索更快的R-CNN培训程序非常有用,但在实践中,需要更多标记的图像来培训鲁棒探测器。解压缩车辆图像并加载车辆地面真理数据。

unvehicleDatasetImages.zipdata = load('vehicleDatasetGroundTruth.mat'); vehicleDataset = data.vehicleDataset;

The vehicle data is stored in a two-column table, where the first column contains the image file paths and the second column contains the vehicle bounding boxes.

将数据拆分为培训集的培训集,用于训练检测器和用于评估检测器的测试集。选择60%的培训数据。使用其余的评估。

rng(0) shuffledIdx = randperm(height(vehicleDataset)); idx = floor(0.6 * height(vehicleDataset)); trainingDataTbl = vehicleDataset(shuffledIdx(1:idx),:); testDataTbl = vehicleDataset(shuffledIdx(idx+1:end),:);

采用ImageageAtastore.andboxLabelDatastore创建用于在培训和评估期间加载图像和标签数据的数据存储。

imdsTrain = imageDatastore(trainingDataTbl{:,'imageFilename'}); bldsTrain = boxLabelDatastore(trainingDataTbl(:,'vehicle'));imdsTest = imageDatastore (testDataTbl {:,'imageFilename'}); bldsTest = boxLabelDatastore(testDataTbl(:,'vehicle'));

Combine image and box label datastores.

TrainingData =联合(IMDstrain,Bldstrain);testdata =组合(IMDSTEST,BLDSTEST);

显示其中一个训练图像和框标签。

data = read(trainingData); I = data{1}; bbox = data{2}; annotatedImage = insertShape(I,'长方形',bbox);AnnotatedImage = IMResize(AnnotatedImage,2);数字imshow(AnnotatedImage)

Create Faster R-CNN Detection Network

A Faster R-CNN object detection network is composed of a feature extraction network followed by two subnetworks. The feature extraction network is typically a pretrained CNN, such as ResNet-50 or Inception v3. The first subnetwork following the feature extraction network is a region proposal network (RPN) trained to generate object proposals - areas in the image where objects are likely to exist. The second subnetwork is trained to predict the actual class of each object proposal.

特征提取网络通常是佩尔普雷雷达的CNN(有关详细信息,请参阅Pretrained Deep Neural Networks(Deep Learning Toolbox)). This example uses ResNet-50 for feature extraction. You can also use other pretrained networks such as MobileNet v2 or ResNet-18, depending on your application requirements.

采用fasterRCNNLayersto create a Faster R-CNN network automatically given a pretrained feature extraction network.fasterRCNNLayers要求您指定几个输入,参数化为R-CNN网络的速度:

  • Network input size

  • 锚盒

  • 特征提取网络

First, specify the network input size. When choosing the network input size, consider the minimum size required to run the network itself, the size of the training images, and the computational cost incurred by processing data at the selected size. When feasible, choose a network input size that is close to the size of the training image and larger than the input size required for the network. To reduce the computational cost of running the example, specify a network input size of [224 224 3], which is the minimum size required to run the network.

InputSize = [224 224 3];

Note that the training images used in this example are bigger than 224-by-224 and vary in size, so you must resize the images in a preprocessing step prior to training.

接下来,使用estimateAnchorBoxesto estimate anchor boxes based on the size of objects in the training data. To account for the resizing of the images prior to training, resize the training data for estimating anchor boxes. Usetransformto preprocess the training data, then define the number of anchor boxes and estimate the anchor boxes.

preprocessedTrainingData = transform(trainingData, @(data)preprocessData(data,inputSize)); numAnchors = 4; anchorBoxes = estimateAnchorBoxes(preprocessedTrainingData,numAnchors)
anchorBoxes =4×296 91 68 65 150 125 38 29

有关选择锚框的更多信息,请参阅Estimate Anchor Boxes From Training Data(Computer Vision Toolbox™) andAnchor Boxes for Object Detection

现在,使用resnet50加载预制Reset-50型号。

featureExtractionNetwork = resnet50;

Select'activation_40_relu'as the feature extraction layer. This feature extraction layer outputs feature maps that are downsampled by a factor of 16. This amount of downsampling is a good trade-off between spatial resolution and the strength of the extracted features, as features extracted further down the network encode stronger image features at the cost of spatial resolution. Choosing the optimal feature extraction layer requires empirical analysis. You can useanalyzeNetwork找到网络内的其他潜在特征提取层的名称。

featureLayer ='activation_40_relu';

Define the number of classes to detect.

numclasses =宽度(车辆达到)-1;

Create the Faster R-CNN object detection network.

lgraph = fasterrcnlayers(输入,numcrasses,anchorboxes,featureextractionnetwork,featurelayer);

You can visualize the network usinganalyzeNetworkor Deep Network Designer from Deep Learning Toolbox™.

If more control is required over the Faster R-CNN network architecture, use Deep Network Designer to design the Faster R-CNN detection network manually. For more information, seeGetting Started with R-CNN, Fast R-CNN, and Faster R-CNN

Data Augmentation

数据增强用于通过在训练期间随机转换原始数据来提高网络精度。通过使用数据增强,您可以为培训数据添加更多种类而无需增加标记的培训样本的数量。

采用transform通过水平翻转图像和相关框标签来增加培训数据。请注意,数据增强不适用于测试数据。理想情况下,测试数据代表原始数据,并未被修改以进行无偏的评估。

augmentedTrainingData = transform(trainingData,@augmentData);

Read the same image multiple times and display the augmented training data.

augmentedData = cell(4,1);fork = 1:4 data = read(augmentedTrainingData); augmentedData{k} = insertShape(data{1},'长方形',数据{2});重置(AugmentedTrainingData);end图蒙太奇(AugmentedData,'BorderSize',10)

Preprocess Training Data

预处理增强培训数据准备培训。

trainingdata = transform(augmentedtrainingdata,@(data)preprocessdata(数据,inputsize));

Read the preprocessed data.

data = read(trainingData);

显示image and box bounding boxes.

I = data{1}; bbox = data{2}; annotatedImage = insertShape(I,'长方形',bbox);AnnotatedImage = IMResize(AnnotatedImage,2);数字imshow(AnnotatedImage)

Train Faster R-CNN

采用培训选项to specify network training options. Set'checkpoinspath'to a temporary location. This enables the saving of partially trained detectors during the training process. If training is interrupted, such as by a power outage or system failure, you can resume training from the saved checkpoint.

选项=培训选项('sgdm',。。。“MaxEpochs”7。。。'MiniBatchSize',1,。。。'InitialLearnRate',1e-3,。。。'checkpoinspath',tempdir);

采用trainFasterRCNNObjectDetectorto train Faster R-CNN object detector ifdoTrainingAndEval是真的。否则,加载佩带的网络。

ifdoTrainingAndEval% Train the Faster R-CNN detector.% * Adjust NegativeOverlapRange and PositiveOverlapRange to ensure% that training samples tightly overlap with ground truth.[探测器,信息] = trainfasterrcnnobjectdetector(trainingdata,lgraph,选项,。。。'负极源极化',[0 0.3],。。。'PositiveOverlapRange',[0.6 1]);else% Load pretrained detector for the example.pretry = load('fasterRCNNResNet50EndToEndVehicleExample.mat'); detector = pretrained.detector;end

This example was verified on an Nvidia(TM) Titan X GPU with 12 GB of memory. Training the network took approximately 20 minutes. The training time varies depending on the hardware you use.

As a quick check, run the detector on one test image. Make sure you resize the image to the same size as the training images.

i = imread(testdatatbl.imagefilename {1});i = imresize(i,输入(1:2));[bboxes,scores] =检测(探测器,i);

显示results.

I = insertObjectAnnotation(I,'长方形',bboxes,得分);数字imshow(i)

使用测试集评估探测器

在大量图像上评估训练的对象检测器以测量性能。Computer Vision Toolbox™提供对象检测器评估功能,以测量平均精度等常用度量(评估法则) and log-average miss rates (评估法律). For this example, use the average precision metric to evaluate performance. The average precision provides a single number that incorporates the ability of the detector to make correct classifications (precision) and the ability of the detector to find all relevant objects (recall).

Apply the same preprocessing transform to the test data as for the training data.

testData = transform(testData,@(data)preprocessData(data,inputSize));

Run the detector on all the test images.

ifdotrainingandeval检测=检测(检测器,testdata,'迷你atchsize',4);else% Load pretrained detector for the example.pretry = load('fasterRCNNResNet50EndToEndVehicleExample.mat'); detectionResults = pretrained.detectionResults;end

Evaluate the object detector using the average precision metric.

[AP,Recall,Precision] =评估预选(检测结果,TestData);

The precision/recall (PR) curve highlights how precise a detector is at varying levels of recall. The ideal precision is 1 at all recall levels. The use of more data can help improve the average precision but might require more training time. Plot the PR curve.

figure plot(recall,precision) xlabel('Recall') ylabel('Precision') 网格ontitle(sprintf('平均precision =%.2f', ap))

Supporting Functions

功能data = augmentData(data)% Randomly flip images and bounding boxes horizontally.tform = randomAffine2d('XReflection',真的);sz = size(数据{1},[1 2]);rut = AffineOutputView(SZ,TForm);数据{1} = imwarp(数据{1},tform,'OutputView',rout);% Sanitize box data, if needed.data{2} = helperSanitizeBoxes(data{2}, sz);% Warp boxes.{2} = bboxwar数据p(data{2},tform,rout);end功能data = preprocessData(data,targetSize)% Resize image and bounding boxes to targetSize.sz = size(数据{1},[1 2]);scale = targetSize(1:2)./sz; data{1} = imresize(data{1},targetSize(1:2));% Sanitize box data, if needed.data{2} = helperSanitizeBoxes(data{2}, sz);% Resize boxes.data{2} = bboxresize(data{2},scale);end

References

[1] Ren, S., K. He, R. Gershick, and J. Sun. "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks."IEEE Transactions of Pattern Analysis and Machine Intelligence。卷。39,第6号,2017年6月,第1137-1149页。

[2] Girshick, R., J. Donahue, T. Darrell, and J. Malik. "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation."Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition。Columbus, OH, June 2014, pp. 580-587.

[3] Girshick,R.“Fast R-CNN”。Proceedings of the 2015 IEEE International Conference on Computer Vision。Santiago, Chile, Dec. 2015, pp. 1440-1448.

[4] Zitnick, C. L., and P. Dollar. "Edge Boxes: Locating Object Proposals from Edges."欧洲电脑愿景会议。Zurich, Switzerland, Sept. 2014, pp. 391-405.

[5] Uijlings, J. R. R., K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. "Selective Search for Object Recognition."国际计算机愿景。卷。104, Number 2, Sept. 2013, pp. 154-171.

See Also

Functions

Related Topics