Main Content

predict

班级:RegressionLinear

Predict response of linear regression model

Description

例子

YHat=predict(MDL,,,,X返回预测数据中每个观察结果的响应Xbased on the trained linear regression modelMDLYHat包含对每个正则强度的响应MDL

例子

YHat=predict(MDL,,,,X,,,,'ObservationsIn',dimensionspecifies the predictor data observation dimension, either“行”(默认)或'列'。For example, specify'ObservationsIn','columns'指出预测数据数据中的列对应于观测值。

输入参数

eXpand all

l一世near regression model, specified as aRegressionLinear模型对象。You can create aRegressionLinear模型对象使用fitrlinear

预测数据用于生成反应,具体ed as a full or sparse numeric matrix or a table.

默认情况下,每一行Xcorresponds to one observation, and each column corresponds to one variable.

  • 对于数字矩阵:

    • 列中的变量X必须具有与训练的预测变量相同的顺序MDL

    • 如果你训练MDL使用表(例如,Tbl)andTblcontains only numeric predictor variables, thenXcan be a numeric matrix. To treat numeric predictors inTbl作为培训期间的分类,通过使用CategoricalPredictorsname-value pair argument offitrlinear。IfTblcontains heterogeneous predictor variables (for example, numeric and categorical data types) andX是a numeric matrix, thenpredictthrows an error.

  • For a table:

    • predictdoes not support multicolumn variables or cell arrays other than cell arrays of character vectors.

    • 如果你训练MDL使用表(例如,Tbl),,,,then all predictor variables inX必须具有与训练的变量相同的变量名称和数据类型MDL(stored inmdl.predictictornames)。However, the column order ofX不需要对应于Tbl。还,TblandXcan contain additional variables (response variables, observation weights, and so on), butpredict忽略它们。

    • 如果你训练MDLusing a numeric matrix, then the predictor names inmdl.predictictornamesmust be the same as the corresponding predictor variable names inX。To specify predictor names during training, use thePredictorNamesname-value pair argument offitrlinear。全部predictor variables inXmust be numeric vectors.Xcan contain additional variables (response variables, observation weights, and so on), butpredict忽略它们。

笔记

如果您将预测器矩阵定向,以使观察值与列相对应并指定'ObservationsIn','columns',那么您可能会大大减少优化执行时间。您无法指定'ObservationsIn','columns'对于表中的预测数据。

Data Types:double|single|桌子

预测数据观察维度,指定为'列'or“行”

笔记

如果您将预测器矩阵定向,以使观察值与列相对应并指定'ObservationsIn','columns',那么您可能会大大减少优化执行时间。您无法指定'ObservationsIn','columns'对于表中的预测数据。

Output Arguments

eXpand all

Predicted responses, returned as an-by-l数字矩阵。n是观察的数量Xandl是the number of regularization strengths inMDL。lambdaYHat(一世,,,,j是观察的反应一世使用具有正则强度的线性回归模型mdl.lambda(j

使用正规化强度的模型的预测响应j y ^ j = X β j + b j

  • X是an observation from the predictor data matrixX,是行矢量。

  • β j 是the estimated column vector of coefficients. The software stores this vector inMDL。Beta(:,j

  • b j 是the estimated, scalar bias, which the software stores inmdl.bias(j

Examples

eXpand all

模拟此模型中的10000个观测值

y = X 1 0 0 + 2 X 2 0 0 + e

  • X = X 1 ,,,, ,,,, X 1 0 0 0 是一个10000 x-1000稀疏矩阵,具有10%非零标准正常元素。

  • e是random normal error with mean 0 and standard deviation 0.3.

rng(1)% For reproducibilityn = 1e4;d = 1e3;NZ = 0.1;X = Sprandn(n,d,nz);y = x(:,100) + 2*x(:,200) + 0.3*randn(n,1);

Train a linear regression model. Reserve 30% of the observations as a holdout sample.

CVMdl = fitrlinear(X,Y,'Holdout',,,,0。3); Mdl = CVMdl.Trained{1}
MDL=RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x1 double] Bias: -0.0066 Lambda: 1.4286e-04 Learner: 'svm' Properties, Methods

CVMdl是a回归专业模型。It contains the propertyTrained,,,,which is a 1-by-1 cell array holding aRegressionLinearmodel that the software trained using the training set.

Extract the training and test data from the partition definition.

trainIdx = training(CVMdl.Partition); testIdx = test(CVMdl.Partition);

Predict the training- and test-sample responses.

yHatTrain = predict(Mdl,X(trainIdx,:)); yHatTest = predict(Mdl,X(testIdx,:));

因为有一个正规化强度MDL,,,,yHatTrainandYhattestare numeric vectors.

Predict responses from the best-performing, linear regression model that uses a lasso-penalty and least squares.

Simulate 10000 observations as in预测测试样本响应

rng(1)% For reproducibilityn = 1e4;d = 1e3;NZ = 0.1;X = Sprandn(n,d,nz);y = x(:,100) + 2*x(:,200) + 0.3*randn(n,1);

1 0 - 5 through 1 0 - 1

lambda = logspace(-5,-1,15);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Optimize the objective function using SpaRSA.

X=X'; CVMdl = fitrlinear(X,Y,'ObservationsIn',,,,'列',,,,'KFold',,,,5,,,,'lambda',,,,lambda,。。。'学习者',,,,“最少”,,,,'Solver',,,,'sparsa',,,,“正则化”,,,,'套索');numCLModels = numel(CVMdl.Trained)
numCLModels = 5

CVMdl是a回归专业模型。因为fitrlinear一世mplements 5-fold cross-validation,CVMdlcontains 5RegressionLinear该软件在每个折叠上训练的模型。

Display the first trained linear regression model.

MDL1=CVMdl.Trained{1}
MDL1=RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x15 double] Bias: [-0.0049 -0.0049 -0.0049 -0.0049 -0.0049 -0.0048 ... ] Lambda: [1.0000e-05 1.9307e-05 3.7276e-05 7.1969e-05 ... ] Learner: 'leastsquares' Properties, Methods

MDL1是aRegressionLinear模型对象。fitrlinearconstructedMDL1通过在前四倍上进行训练。因为兰姆达是a sequence of regularization strengths, you can think ofMDL1as 11 models, one for each regularization strength in兰姆达

估计交叉验证的MSE。

mse = kfoldloss(cvmdl);

更高的值兰姆达lead to predictor variable sparsity, which is a good quality of a regression model. For each regularization strength, train a linear regression model using the entire data set and the same options as when you cross-validated the models. Determine the number of nonzero coefficients per model.

MDL=fitrlinear(X,Y,'ObservationsIn',,,,'列',,,,'lambda',,,,lambda,。。。'学习者',,,,“最少”,,,,'Solver',,,,'sparsa',,,,“正则化”,,,,'套索');numNZCoeff = sum(Mdl.Beta~=0);

在同一图中,为每个正则化强度绘制跨验证的MSE和非零系数的频率。在日志刻度上绘制所有变量。

数字;[H,HL1,HL2] = Plotyy(Log10(Lambda),Log10(MSE),。。。log10(Lambda),log10(numNZCoeff)); hL1.Marker ='o';hl2.marker ='o';Ylabel(h(1),'log_ {10} mse')Ylabel(H(2),'log_{10} nonzero-coefficient frequency')Xlabel('log_ {10} lambda')hold离开

Figure contains 2 axes objects. Axes object 1 contains an object of type line. Axes object 2 contains an object of type line.

选择平衡预测变量稀疏性和低MSE的正规化强度的索引(例如兰姆达(10))。

一世dxFinal = 10;

提取与最小MSE相对应的模型。

MDLFinal = selectModels(Mdl,idxFinal)
MDLFinal = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x1 double] Bias: -0.0050 Lambda: 0.0037 Learner: 'leastsquares' Properties, Methods
idxnzcoeff =find(MdlFinal.Beta~=0)
idxnzcoeff =2×1100200
EstCoeff = Mdl.Beta(idxNZCoeff)
EstCoeff =2×11。00511。9965

MDLFinal是aRegressionLinear具有一个正规化强度的模型。非零系数EstCoeff接近模拟数据的系数。

模拟10个新观察结果,并使用表现最佳模型预测相应的响应。

Xnew = Sprandn(D,10,NZ);yhat =预测(mdlfinal,Xnew,'ObservationsIn',,,,'列');

Extended Capabilities

也可以看看

|

在R2016a中引入