Main Content

Label Data Using Semi-Supervised Learning Techniques

此示例显示如何使用基于图形和自培训半监督的学习技术来标记数据。

半监督学习结合了监督学习的各个方面,其中所有培训数据都被标记,并且无监督的学习,真正的标签未知。也就是说,一些培训观察标记,但绝大多数都是未标记的。半监督学习方法尝试利用数据的底层结构将标签适合未标记的数据。

Statistics and Machine Learning Toolbox™ provides these semi-supervised learning functions for classification:

  • 适合的constructs a similarity graph with labeled and unlabeled observations as nodes, and distributes label information from labeled observations to unlabeled observations.

  • fitsemiselfiteratively trains a classifier on the data. First, the function trains a classifier on the labeled data alone, and then uses that classifier to make label predictions for the unlabeled data.fitsemiselfprovides scores for the predictions, and then treats the predictions as true labels for the next training cycle of the classifier if the scores are above a certain threshold. This process repeats until the label predictions converge.

Generate Data

从两个半月形形状生成数据。通过使用基于图形和自培训半监督技术来确定哪个月亮新点属于。

创建自定义函数倍胶(shown at the end of this example). This function takes an input argumentN.and createsN.两个交错的半月中的每一个点:一个凹陷的顶部月亮和底部月亮凹陷。

通过使用使用一组40个标记的数据点倍胶function. Each point inX是两个卫星之一,相应的月亮标签存储在载体中标签

rng('default'的)% For reproducibility[x,标签] =二兆(20);

通过使用散点图可视化点。同一个月亮的点具有相同的颜色。

散射(x(:,1),x(:,2),[],标签,'填充'的)title('Labeled Data'的)

通过使用通过使用一组400个未标记的数据点倍胶function. Each point in新兴属于两个卫星之一,但相应的月亮标签是未知的。

newx =二兆(200);

使用基于图的方法标记数据

Label the unlabeled data in新兴通过使用半监督的基于图形的方法。默认情况下,适合的constructs a similarity graph from the data inXand新兴,并使用标签传播技术将标签适合新兴

graphmdl = fitsemigraph(x,label,newx)
GraphMDL = SemuperviseGraphModel具有属性:FittedLabels:[400x1 Double]标签:[400x2 Double] ClassNames:[1 2] ResponseName:'Y'类分类预测器:[]方法:'LabelPropagation'属性,方法

该函数返回一个Semisupervisegraphmodel.对象Fittedlabels.属性包含未标记数据的拟合标签,其标签属性包含关联的标签分数。

Visualize the fitted label results by using a scatter plot. Use the fitted labels to set the color of the points, and use the maximum label scores to set the transparency of the points. Points with less transparency are labeled with greater confidence.

maxGraphScores = max(graphMdl.LabelScores,[],2); rescaledGraphScores = rescale(maxGraphScores,0.05,0.95); scatter(newX(:,1),newX(:,2),[],graphMdl.FittedLabels,'填充'......'markerfacealpha''平坦的''AlphaData',重新定义);标题(["Fitted Labels for Unlabeled Data"“(基于图表)”])

This method seems to label the新兴准确点。这两个卫星在视觉上区别,并且用最不确定性标记的点位于两个形状之间的边界上。

使用自培训方法标记数据

Label the unlabeled data in新兴by using a semi-supervised self-training method. By default,fitsemiselfuses a support vector machine (SVM) model with a Gaussian kernel to label the data iteratively.

selfsvmmdl = fitsemiself(x,标签,newx)
selfsvmmdl = selfisupervisedselftraining model与属性:fittedlabels:[400x1 double]标签:[400x2 double] classNames:[1 2] racatectename:'y'类别预防icon:[]学习者:[1x1 classReg.Learning.Classif.comPactClassificationsVM]属性,方法

该函数返回一个SemisupervisedSelftrinainingModel.对象Fittedlabels.属性包含未标记数据的拟合标签,其标签属性包含关联的标签分数。

Visualize the fitted label results by using a scatter plot. As before, use the fitted labels to set the color of the points, and use the maximum label scores to set the transparency of the points.

maxsvmscores = max(selfsvmmdl.labelscores,[],2);RescaledSvmscores = Rescale(MaxSvmores,0.05,0.95);散射(Newx(:,1),newx(:,2),[],selfsvmmdl.fittedlabels,'填充'......'markerfacealpha''平坦的''AlphaData'那rescaledSVMScores); title(["Fitted Labels for Unlabeled Data"“(自我训练:SVM)”])

使用SVM学习者此方法似乎也准确地标记了NewX点。这两个卫星在视觉上区别,并且用最不确定性标记的点位于两个形状之间的边界上。

Some learners might not label the unlabeled data as effectively, however. For example, use a tree model instead of the default SVM model to label the data in新兴

selftreemdl = fitsemiself(x,标签,newx,'Learner''tree');

可视化拟合标签结果。

maxtreescores = max(selftreemdl.labelscores,[],2);RescaledTreescores = Rescale(MaxTreescires,0.05,0.95);分散(Newx(:,1),Newx(:,2),[],SelftreeMDL.FittedLabels,'填充'......'markerfacealpha''平坦的''AlphaData',RescaledTreescores);标题(["Fitted Labels for Unlabeled Data"“(自我训练:树)”])

This method, with a tree learner, mislabels many of the points in the top moon. When you use a semi-supervised self-training method, make sure to use an underlying learner that is appropriate for the structure of your data.

This code creates the function倍胶

function[X,label] = twomoons(n)%生成两个卫星,每个月亮中的n个点。% t相关指定半径和角度wo moons.N.oise = (1/6).*randn(n,1); radius = 1 + noise; angle1 = pi + pi/10; angle2 = pi/10;% Create the bottom moon with a center at (1,0).bottomTheta = linspace(-angle1,angle2,n)'; bottomX1 = radius.*cos(bottomTheta) + 1; bottomX2 = radius.*sin(bottomTheta);% Create the top moon with a center at (0,0).topTheta = linspace(angle1,-angle2,n)'; topX1 = radius.*cos(topTheta); topX2 = radius.*sin(topTheta);%返回月亮点及其标签。X= [bottomX1 bottomX2; topX1 topX2]; label = [ones(n,1); 2*ones(n,1)];结尾

也可以看看

|