主要内容

Partial Least Squares

Introduction to Partial Least Squares

部分最小二乘() regression is a technique used with data that contain correlated predictor variables. This technique constructs new predictor variables, known ascomponents,作为原始预测变量的线性组合。PLS在考虑观察到的响应值的同时构建了这些组件,从而导致具有可靠预测能力的简约模型。

The technique is something of a cross betweenmultiple linear regression主成分分析:

  • 多个线性回归找到了最适合响应的预测因子的组合。

  • Principal component analysis finds combinations of the predictors with large variance, reducing correlations. The technique makes no use of response values.

  • 请finds combinations of the predictors that have a large covariance with the response values.

请therefore combines information about the variances of both the predictors and the responses, while also considering the correlations among them.

请shares characteristics with other regression and feature transformation techniques. It is similar toridge regression因为它用于具有相关预测因子的情况。类似于stepwise regression(或更一般的菲ature selectiontechniques) in that it can be used to select a smaller set of model terms. PLS differs from these methods, however, by transforming the original predictor space into the new component space.

功能plsregresscarries out PLS regression.

Perform Partial Least-Squares Regression

此示例演示了如何执行PLS回归以及如何选择PLS模型中的组件数量。

Consider the data on biochemical oxygen demand inMoore.mat,带有预测因子的嘈杂版本以引入相关性。

loadmoorey = moore(:,6);% ResponseX0 = moore(:,1:5);% Original predictorsX1 = X0+10*randn(size(X0));% Correlated predictorsX = [X0,X1];

Useplsregressto perform PLS regression with the same number of components as predictors, then plot the percentage variance explained in the response as a function of the number of components.

[XL,yl,XS,YS,beta,PCTVAR] = plsregress(X,y,10); plot(1:10,cumsum(100*PCTVAR(2,:)),'-o')xlabel('Number of PLS components')ylabel('在y中解释的百分比差异')

Figure contains an axes object. The axes object contains an object of type line.

Choosing the number of components in a PLS model is a critical step. The plot gives a rough indication, showing nearly 80% of the variance iny由第一个组件解释,多达五个组件做出了重大贡献。

The following computes the six-component model.

[XL,YL,XS,YS,BETA,PCTVAR,MSE,STATS] = PLSREGRESS(X,Y,6);yfit = [hons(size(x,1),1)x]*beta;情节(y,yfit,'o')

Figure contains an axes object. The axes object contains an object of type line.

散射显示拟合和观察到的响应之间的合理相关性,这是由 R 2 statistic.

TSS = sum((y-mean(y)).^2); RSS = sum((y-yfit).^2); Rsquared = 1 - RSS/TSS
rsquared = 0.8240

A plot of the weights of the ten predictors in each of the six components shows that two of the components (the last two computed) explain the majority of the variance inX.

figure plot(1:10,stats.W,'o-') 传奇({'c1','c2','c3','c4','c5','c6'},'地点','best')xlabel('Predictor')ylabel('Weight')

Figure contains an axes object. The axes object contains 6 objects of type line. These objects represent c1, c2, c3, c4, c5, c6.

A plot of the mean-squared errors suggests that as few as two components may provide an adequate model.

figure yyaxisleftplot(0:6,MSE(1,:),'-o')yyaxisrightplot(0:6,MSE(2,:),'-o') legend('MSE Predictors',“ MSE响应”)xlabel('Number of Components')

Figure contains an axes object. The axes object contains 2 objects of type line. These objects represent MSE Predictors, MSE Response.

通过plsregress由可选的名称值参数控制,指定交叉验证类型和蒙特卡洛重复的数量。

See Also

相关话题