Getting Started with Mask R-CNN for Instance Segmentation

实例分割是一种增强的对象检测类型，可为对象的每个检测到的实例生成分割图。实例分割将单个对象视为不同的实体，而不论对象的类别。相反，语义分割考虑了同一类的所有对象，属于单个实体。

Mask R-CNN is a popular deep learning instance segmentation technique that performs pixel-level segmentation on detected objects[1]。蒙版R-CNN算法可以容纳多个类和重叠对象。

您可以使用该网络创建验证的蒙版R-CNN网络maskrcnn目的。该网络对MS-COCO数据集进行了训练，并可以检测80个不同类别的对象。要执行实例分割，请将验证的网络传递给segmentObjects功能。

If you want to modify the network to detect additional classes, or to adjust other parameters of the network, then you can perform transfer learning. For an example that shows how to train a Mask R-CNN, seePerform Instance Segmentation Using Mask R-CNN。

Mask R-CNN Network Architecture

这Mask R-CNN network consists of two stages. The first stage is a region proposal network (RPN), which predicts object proposal bounding boxes based on anchor boxes. The second stage is an R-CNN detector that refines these proposals, classifies them, and computes the pixel-level segmentation for these proposals.

RPN作为功能提取器的一部分，然后进行对象分类，为输入图像产生边界框和语义分割掩码

蒙版R-CNN模型在更快的R-CNN模型上构建。mask r-CNN用一个更快的R-CNN代替ROI Max池池Roialignlayerthat provides more accurate sub-pixel level ROI pooling. The Mask R-CNN network also adds a mask branch for pixel level object segmentation. For more information about the Faster R-CNN network, seeGetting Started with R-CNN, Fast R-CNN, and Faster R-CNN。

This diagram shows a modified Faster R-CNN network on the left and a mask branch on the right.

Faster R-CNN network connected to a mask branch using an ROI align layer

To configure a Mask R-CNN network for transfer learning, specify the class names and anchor boxes when you create amaskrcnn目的。You can optionally specify additional network properties including the network input size and the ROI pooling sizes.

准备口罩R-CNN培训数据

Load Data

To train a Mask R-CNN, you need the following data.

数据	描述
RGB图像	用作网络输入的RGB图像，指定为H-经过-W-by-3数字阵列。例如，此示例RGB图像是Camvid数据集中的修改图像[2]that has been edited to remove personally identifiable information.
Ground-truth bounding boxes	RGB图像中对象的边界框，指定为NumObjects-经过-4 matrix, with rows in the format [xywh]). 例如，`bboxes`variable shows the bounding boxes of six objects in the sample RGB image. bboxes = 394 442 36 101 436 457 32 88 619 293 209 281 460 441 210 234 862 375 190 314 816 271 235 305
Instance labels	Label of each instance, specified as aNumObjects-b-1字符串向量或NumObjects-1个字符矢量的细胞阵列。）例如，`labels`variable shows the labels of six objects in the sample RGB image. 标签= 6×1个单元格数组{'person'} {'person'} {'车辆'} {'车辆'} {'车辆'} {'车辆'}
Instance masks	掩盖物体实例。蒙版数据有两种格式：二进制口罩，指定为逻辑阵列H-经过-W-经过-NumObjects。Each mask is the segmentation of one instance in the image. Polygon coordinates, specified as aNumObjects-经过-2 cell array. Each row of the array contains the (x,y) coordinates of a polygon along the boundary of one instance in the image. 蒙版R-CNN网络需要二进制蒙版，而不是多边形坐标。要将多边形坐标转换为二进制面具，请使用`poly2mask`功能。这`poly2mask`函数将多边形内部的像素设置为`1`and sets pixels outside the polygon to`0`。This code shows how to convert polygon coordinates in the`masks_polygon`可变到大小的二进制口罩h-经过-w-经过-numObjects。 densemasks = false（[[H，W，NumObjects]）;fori = 1:numObjects denseMasks(:,:,i) = poly2mask(masks_polygon{i}(:,1),masks_polygon{i}(:,2),h,w);end For example, this montage shows the binary masks of six objects in the sample RGB image.

Create Datastore that Reads Data

使用数据存储读取数据。数据存储必须以1 x-4单元格数返回数据，该阵列的格式{RGB映像，边界框，标签，掩码}。您可以使用以下步骤以这种格式创建数据存储：

创建一个n成像返回RGB图像数据
创建一个boxLabelDatastore将边界的框数据和实例标签返回为两列单元格数组
创建一个n成像并指定一个自定义读取功能，该功能将蒙版数据返回为二进制矩阵
Combine the three datastores using the结合功能

图像，边界框和掩码的大小必须匹配网络的输入大小。如果您需要调整数据大小，则可以使用精加工调整RGB图像和口罩的大小，以及bboxresize调整边界框的功能。

有关更多信息，请参阅数据stores for Deep Learning(Deep Learning Toolbox)。

可视化培训数据

要通过图像显示实例掩码，请使用InsertObjectMask。您可以指定一个colormap，以便每个实例以不同的颜色出现。此示例代码显示了如何显示实例掩码在面具variable over the RGB image in the我是variable using thelinescolormap.

我是Overlay = insertObjectMask(im,masks,Color=lines(numObjects)); imshow(imOverlay);

每个行人和车辆在RGB图像上都有独特的伪色色调

To show the bounding boxes with labels over the image, use the展示功能。此示例代码显示了如何显示带有边界框大小和位置数据的标记的矩形形状bboxes变量和标签数据labelsvariable.

我是show(imOverlay) showShape(“矩形”,bboxes,Label=labels,Color="red"）；

Red rectangles labeled 'Pedestrian' and 'Vehicle' surround instances of each object

火车面具R-CNN型号

Train a Mask R-CNN network using thetrainMaskRCNN功能。For an example, seePerform Instance Segmentation Using Mask R-CNN。

参考

[1]He, Kaiming, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. "Mask R-CNN."ArXiv:1703.06870 [Cs]，2018年1月24日。https：//arxiv.org/pdf/1703.06870。

[2]Brostow, Gabriel J., Julien Fauqueur, and Roberto Cipolla. "Semantic Object Classes in Video: A High-Definition Ground Truth Database." Pattern Recognition Letters 30, no. 2 (January 2009): 88–97. https://doi.org/10.1016/j.patrec.2008.04.005.