Main Content

Train Residual Network for Image Classification

This example shows how to create a deep learning neural network with residual connections and train it on CIFAR-10 data. Residual connections are a popular element in convolutional neural network architectures. Using residual connections improves gradient flow through the network and enables training of deeper networks.

For many applications, using a network that consists of a simple sequence of layers is sufficient. However, some applications require networks with a more complex graph structure in which layers can have inputs from multiple layers and outputs to multiple layers. These types of networks are often called directed acyclic graph (DAG) networks. A residual network is a type of DAG network that has residual (or shortcut) connections that bypass the main network layers. Residual connections enable the parameter gradients to propagate more easily from the output layer to the earlier layers of the network, which makes it possible to train deeper networks. This increased network depth can result in higher accuracies on more difficult tasks.

To create and train a network with a graph structure, follow these steps.

  • Create aLayerGraphobject usinglayerGraph。图层图指定了网络架构。您可以创建一个空图层图,然后添加图层。您还可以直接从网络图层组建图表图。在这种情况下,layerGraphconnects the layers in the array one after the other.

  • 使用图层与图层图一起添加addLayers, and remove layers from the graph usingremoveLayers

  • Connect layers to other layers usingconnectLayers, and disconnect layers from other layers usingdisconnectLayers

  • 使用网络架构绘制plot

  • Train the network usingtrainNetwork。The trained network is aDAGNetworkobject.

  • Perform classification and prediction on new data usingclassifyandpredict

You can also load pretrained networks for image classification. For more information, seePretrained Deep Neural Networks

Prepare Data

Download the CIFAR-10 data set [1]. The data set contains 60,000 images. Each image is 32-by-32 in size and has three color channels (RGB). The size of the data set is 175 MB. Depending on your internet connection, the download process can take time.

datadir = tempdir; downloadCIFARData(datadir);

Load the CIFAR-10 training and test images as 4-D arrays. The training set contains 50,000 images and the test set contains 10,000 images. Use the CIFAR-10 test images for network validation.

[XTrain,YTrain,XValidation,YValidation] = loadCIFARData(datadir);

You can display a random sample of the training images using the following code.

figure; idx = randperm(size(XTrain,4),20); im = imtile(XTrain(:,:,:,idx),'ThumbnailSize',[96,96]); imshow(im)

Create anaugmentedImageDatastoreobject to use for network training. During training, the datastore randomly flips the training images along the vertical axis and randomly translates them up to four pixels horizontally and vertically. Data augmentation helps prevent the network from overfitting and memorizing the exact details of the training images.

imageSize = [32 32 3]; pixelRange = [-4 4]; imageAugmenter = imageDataAugmenter(。。。'randxreflection',true,。。。'RandXTranslation',pixelRange,。。。'RandYTranslation',pixelRange); augimdsTrain = augmentedImageDatastore(imageSize,XTrain,YTrain,。。。'DataAugmentation',imageAugmenter,。。。'OutputSizeMode','randcrop');

Define Network Architecture

The residual network architecture consists of these components:

  • 具有卷积,批量归一化和relu层的主要分支顺序连接。

  • Residual connectionsthat bypass the convolutional units of the main branch. The outputs of the residual connections and convolutional units are added element-wise. When the size of the activations changes, the residual connections must also contain 1-by-1 convolutional layers. Residual connections enable the parameter gradients to flow more easily from the output layer to the earlier layers of the network, which makes it possible to train deeper networks.

Create Main Branch

Start by creating the main branch of the network. The main branch contains five sections.

  • An initial section containing the image input layer and an initial convolution with activation.

  • Three stages of convolutional layers with different feature sizes (32-by-32, 16-by-16, and 8-by-8). Each stage containsNconvolutional units. In this part of the example,N = 2。Each convolutional unit contains two 3-by-3 convolutional layers with activations. ThenetWidth参数是网络宽度,定义为网络第一阶段的卷积层中的过滤器数。第二和第三阶段中的第一卷积单元将空间尺寸下降两倍。为了使每个卷积层中所需的计算量大致相同,每次执行空间下采样时,将过滤器的数量增加两个。

  • A final section with global average pooling, fully connected, softmax, and classification layers.

UseconvolutionalUnit(numF,stride,tag)to create a convolutional unit.numF是每层卷积滤波器的数量,strideis the stride of the first convolutional layer of the unit, andtagis a character array to prepend to the layer names. TheconvolutionalUnitfunction is defined at the end of the example.

Give unique names to all the layers. The layers in the convolutional units have names starting with'SjUk', wherejis the stage index andkis the index of the convolutional unit within that stage. For example,'S2U1'denotes stage 2, unit 1.

netWidth = 16; layers = [ imageInputLayer([32 32 3],'Name',“输入”) convolution2dLayer(3,netWidth,'Padding','same','Name','convInp') batchNormalizationLayer('Name','BNInp') reluLayer('Name','reluInp') convolutionalUnit(netWidth,1,'S1U1') additionLayer(2,'Name','add11') reluLayer('Name','relu11') convolutionalUnit(netWidth,1,'S1U2') additionLayer(2,'Name','add12') reluLayer('Name','relu12') convolutionalUnit(2*netWidth,2,'S2U1') additionLayer(2,'Name','add21') reluLayer('Name','relu21') convolutionalUnit(2*netWidth,1,'S2U2') additionLayer(2,'Name','add22') reluLayer('Name','relu22') convolutionalUnit(4*netWidth,2,'S3U1') additionLayer(2,'Name','add31') reluLayer('Name','relu31') convolutionalUnit(4*netWidth,1,'S3U2') additionLayer(2,'Name','add32') reluLayer('Name','relu32') averagePooling2dLayer(8,'Name','globalPool') fullyConnectedLayer(10,'Name','fcFinal') softmaxLayer('Name','softmax') classificationLayer('Name','classoutput') ];

Create a layer graph from the layer array.layerGraphconnects all the layers inlayerssequentially. Plot the layer graph.

lgraph = layerGraph(layers); figure('Units','normalized','位置',[0.2 0.2 0.6 0.6]); plot(lgraph);

Create Residual Connections

Add residual connections around the convolutional units. Most residual connections perform no operations and simply add element-wise to the outputs of the convolutional units.

Create the residual connection from the'reluInp'to the'add11'layer. Because you specified the number of inputs to the addition layer to be two when you created the layer, the layer has two inputs with the names'in1'and'in2'。The final layer of the first convolutional unit is already connected to the'in1'input. The addition layer then sums the outputs of the first convolutional unit and the'reluInp'layer.

In the same way, connect the'relu11'layer to the second input of the'add12'layer. Check that you have connected the layers correctly by plotting the layer graph.

lgraph = connectLayers(lgraph,'reluInp','add11/in2'); lgraph = connectLayers(lgraph,'relu11','add12/in2'); figure('Units','normalized','位置',[0.2 0.2 0.6 0.6]); plot(lgraph);

When the layer activations in the convolutional units change size (that is, when they are downsampled spatially and upsampled in the channel dimension), the activations in the residual connections must also change size. Change the activation sizes in the residual connections by using a 1-by-1 convolutional layer together with its batch normalization layer.

skip1 = [ convolution2dLayer(1,2*netWidth,'Stride',2,'Name','skipConv1') batchNormalizationLayer('Name','skipBN1')]; lgraph = addLayers(lgraph,skip1); lgraph = connectLayers(lgraph,'relu12','skipConv1'); lgraph = connectLayers(lgraph,'skipBN1','add21/in2');

Add the identity connection in the second stage of the network.

lgraph = connectLayers(lgraph,'relu21','add22/in2');

Change the activation size in the residual connection between the second and third stages by another 1-by-1 convolutional layer together with its batch normalization layer.

skip2 = [ convolution2dLayer(1,4*netWidth,'Stride',2,'Name','skipConv2') batchNormalizationLayer('Name','skipBN2')]; lgraph = addLayers(lgraph,skip2); lgraph = connectLayers(lgraph,'relu22','skipConv2'); lgraph = connectLayers(lgraph,'skipBN2','add31/in2');

Add the last identity connection and plot the final layer graph.

lgraph = connectLayers(lgraph,'relu31','add32/in2'); figure('Units','normalized','位置',[0.2 0.2 0.6 0.6]); plot(lgraph)

Create Deeper Network

To create a layer graph with residual connections for CIFAR-10 data of arbitrary depth and width, use the supporting functionresidualCIFARlgraph

lgraph = residualCIFARlgraph(netWidth,numUnits,unitType)creates a layer graph for CIFAR-10 data with residual connections.

  • netWidthis the network width, defined as the number of filters in the first 3-by-3 convolutional layers of the network.

  • numUnitsis the number of convolutional units in the main branch of network. Because the network consists of three stages where each stage has the same number of convolutional units,numUnitsmust be an integer multiple of 3.

  • unitTypeis the type of convolutional unit, specified as“标准”or"bottleneck"。标准卷积单元由两个3×3卷积层组成。瓶颈卷积单元由三个卷积层组成:一个1×1层,用于在通道尺寸,3×3卷积层和1×1层中进行下采样,用于在通道尺寸中采样。因此,瓶颈卷积单元比标准单元具有50%的卷积层,但仅有一半的空间3×3卷积。这两个单元类型具有类似的计算复杂性,但在剩余连接中传播的特征总数在使用瓶颈单元时比在剩余连接中的四倍。总深度,定义为连续卷积和完全连接层的最大数量,是2 *numUnits+ 2 for networks with standard units and 3*numUnits+ 2 for networks with bottleneck units.

Create a residual network with nine standard convolutional units (three units per stage) and a width of 16. The total network depth is 2*9+2 = 20.

numUnits = 9; netWidth = 16; lgraph = residualCIFARlgraph(netWidth,numUnits,“标准”); figure('Units','normalized','位置',[0.1 0.1 0.8 0.8]); plot(lgraph)

Train Network

Specify training options. Train the network for 80 epochs. Select a learning rate that is proportional to the mini-batch size and reduce the learning rate by a factor of 10 after 60 epochs. Validate the network once per epoch using the validation data.

miniBatchSize = 128; learnRate = 0.1*miniBatchSize/128; valFrequency = floor(size(XTrain,4)/miniBatchSize); options = trainingOptions('sgdm',。。。'InitialLearnRate',learnRate,。。。'MaxEpochs',80,。。。'MiniBatchSize',miniBatchSize,。。。'VerboseFrequency',valFrequency,。。。'Shuffle','every-epoch',。。。“阴谋”,'training-progress',。。。'Verbose',false,。。。'ValidationData',{XValidation,YValidation},。。。'ValidationFrequency',valFrequency,。。。'LearnRateSchedule','piecewise',。。。'LearnRateDropFactor',0.1,。。。'LearnRateDropPeriod',60);

To train the network usingtrainNetwork, set thedoTrainingflag totrue。否则,加载备用网络。在良好的GPU上培训网络需要大约需要两个小时。如果您没有GPU,则培训需要更长时间。

doTraining = false;ifdoTraining trainedNet = trainNetwork(augimdsTrain,lgraph,options);elseload('CIFARNet-20-16.mat','trainedNet');end

Evaluate Trained Network

Calculate the final accuracy of the network on the training set (without data augmentation) and validation set.

[YValPred,probs] = classify(trainedNet,XValidation); validationError = mean(YValPred ~= YValidation); YTrainPred = classify(trainedNet,XTrain); trainError = mean(YTrainPred ~= YTrain); disp("Training error: "+ trainError*100 +"%")
Training error: 2.862%
disp("Validation error: "+ validationError*100 +"%")
验证错误:9.76%

Plot the confusion matrix. Display the precision and recall for each class by using column and row summaries. The network most commonly confuses cats with dogs.

figure('Units','normalized','位置'(0.2 - 0.2 0.4 - 0.4));厘米= confusionchart (YValidation,YValPred); cm.Title ='Confusion Matrix for Validation Data';cm.columnsummary ='column-normalized';cm.RowSummary ='row-normalized';

You can display a random sample of nine test images together with their predicted classes and the probabilities of those classes using the following code.

figure idx = randperm(size(XValidation,4),9);fori = 1:numel(idx) subplot(3,3,i) imshow(XValidation(:,:,:,idx(i))); prob = num2str(100*max(probs(idx(i),:)),3); predClass = char(YValPred(idx(i))); title([predClass,', ',prob,'%'])end

convolutionalUnit(numF,stride,tag)creates an array of layers with two convolutional layers and corresponding batch normalization and ReLU layers.numFis the number of convolutional filters,strideis the stride of the first convolutional layer, andtagis a tag that is prepended to all layer names.

functionlayers = convolutionalUnit(numF,stride,tag) layers = [ convolution2dLayer(3,numF,'Padding','same','Stride',stride,'Name',[tag,“conv1”]) batchNormalizationLayer('Name',[tag,'BN1']) reluLayer('Name',[tag,'relu1']) convolution2dLayer(3,numF,'Padding','same','Name',[tag,'conv2']) batchNormalizationLayer('Name',[tag,'BN2'])];end

参考

[1] Krizhevsky, Alex. "Learning multiple layers of features from tiny images." (2009). https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf

[2] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." InProceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.

See Also

|||

Related Topics