Main Content

predict

Class:RegressionLinear

Predict response of linear regression model

Description

example

YHat= predict(Mdl,X)returns predicted responses for each observation in the predictor dataXbased on the trained linear regression modelMdlYHatcontains responses for each regularization strength inMdl

example

YHat= predict(Mdl,X,'ObservationsIn',dimension)specifies the predictor data observation dimension, either'rows'(default) or'columns'。For example, specify'ObservationsIn','columns'to indicate that columns in the predictor data correspond to observations.

Input Arguments

expand all

Linear regression model, specified as aRegressionLinearmodel object. You can create aRegressionLinearmodel object usingfitrlinear

Predictor data used to generate responses, specified as a full or sparse numeric matrix or a table.

By default, each row ofXcorresponds to one observation, and each column corresponds to one variable.

  • For a numeric matrix:

    • The variables in the columns ofXmust have the same order as the predictor variables that trainedMdl

    • If you trainMdlusing a table (for example,Tbl) andTblcontains only numeric predictor variables, thenXcan be a numeric matrix. To treat numeric predictors inTblas categorical during training, identify categorical predictors by using theCategoricalPredictors名称-值对argument offitrlinear。IfTblcontains heterogeneous predictor variables (for example, numeric and categorical data types) andXis a numeric matrix, thenpredictthrows an error.

  • For a table:

    • predictdoes not support multicolumn variables or cell arrays other than cell arrays of character vectors.

    • If you trainMdlusing a table (for example,Tbl), then all predictor variables inXmust have the same variable names and data types as the variables that trainedMdl(stored inMdl.PredictorNames). However, the column order ofXdoes not need to correspond to the column order ofTbl。Also,TblandXcan contain additional variables (response variables, observation weights, and so on), butpredictignores them.

    • If you trainMdlusing a numeric matrix, then the predictor names inMdl.PredictorNamesmust be the same as the corresponding predictor variable names inX。To specify predictor names during training, use thePredictorNames名称-值对argument offitrlinear。所有的预测变量Xmust be numeric vectors.Xcan contain additional variables (response variables, observation weights, and so on), butpredictignores them.

Note

If you orient your predictor matrix so that observations correspond to columns and specify'ObservationsIn','columns', then you might experience a significant reduction in optimization execution time. You cannot specify'ObservationsIn','columns'for predictor data in a table.

Data Types:double|single|table

Predictor data observation dimension, specified as'columns'or'rows'

Note

If you orient your predictor matrix so that observations correspond to columns and specify'ObservationsIn','columns', then you might experience a significant reduction in optimization execution time. You cannot specify'ObservationsIn','columns'for predictor data in a table.

Output Arguments

expand all

Predicted responses, returned as an-by-Lnumeric matrix.nis the number of observations inXandLis the number of regularization strengths inMdl.LambdaYHat(i,j)is the response for observationiusing the linear regression model that has regularization strengthMdl.Lambda(j)

The predicted response using the model with regularization strengthjis y ^ j = x β j + b j

  • xis an observation from the predictor data matrixX, and is row vector.

  • β j is the estimated column vector of coefficients. The software stores this vector inMdl.Beta(:,j)

  • b j is the estimated, scalar bias, which the software stores inMdl.Bias(j)

Examples

expand all

Simulate 10000 observations from this model

y = x 1 0 0 + 2 x 2 0 0 + e

  • X = x 1 , , x 1 0 0 0 is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

  • eis random normal error with mean 0 and standard deviation 0.3.

rng(1)% For reproducibilityn = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);

Train a linear regression model. Reserve 30% of the observations as a holdout sample.

CVMdl = fitrlinear(X,Y,'Holdout',0.3); Mdl = CVMdl.Trained{1}
Mdl = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x1 double] Bias: -0.0066 Lambda: 1.4286e-04 Learner: 'svm' Properties, Methods

CVMdlis aRegressionPartitionedLinearmodel. It contains the propertyTrained, which is a 1-by-1 cell array holding aRegressionLinearmodel that the software trained using the training set.

Extract the training and test data from the partition definition.

trainIdx = training(CVMdl.Partition); testIdx = test(CVMdl.Partition);

Predict the training- and test-sample responses.

yHatTrain = predict(Mdl,X(trainIdx,:)); yHatTest = predict(Mdl,X(testIdx,:));

因为there is one regularization strength inMdl,yHatTrainandyHatTestare numeric vectors.

Predict responses from the best-performing, linear regression model that uses a lasso-penalty and least squares.

Simulate 10000 observations as inPredict Test-Sample Responses

rng(1)% For reproducibilityn = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);

Create a set of 15 logarithmically-spaced regularization strengths from 1 0 - 5 through 1 0 - 1

Lambda = logspace(-5,-1,15);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Optimize the objective function using SpaRSA.

X = X'; CVMdl = fitrlinear(X,Y,'ObservationsIn','columns','KFold',5,'Lambda',Lambda,。..'Learner','leastsquares','Solver','sparsa','Regularization','lasso'); numCLModels = numel(CVMdl.Trained)
numCLModels = 5

CVMdlis aRegressionPartitionedLinearmodel. Becausefitrlinearimplements 5-fold cross-validation,CVMdlcontains 5RegressionLinearmodels that the software trains on each fold.

Display the first trained linear regression model.

Mdl1 = CVMdl.Trained{1}
Mdl1 = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x15 double] Bias: [-0.0049 -0.0049 -0.0049 -0.0049 -0.0049 -0.0048 ... ] Lambda: [1.0000e-05 1.9307e-05 3.7276e-05 7.1969e-05 ... ] Learner: 'leastsquares' Properties, Methods

Mdl1is aRegressionLinearmodel object.fitrlinearconstructedMdl1by training on the first four folds. BecauseLambdais a sequence of regularization strengths, you can think ofMdl1as 11 models, one for each regularization strength inLambda

Estimate the cross-validated MSE.

mse = kfoldLoss(CVMdl);

Higher values ofLambdalead to predictor variable sparsity, which is a good quality of a regression model. For each regularization strength, train a linear regression model using the entire data set and the same options as when you cross-validated the models. Determine the number of nonzero coefficients per model.

Mdl = fitrlinear(X,Y,'ObservationsIn','columns','Lambda',Lambda,。..'Learner','leastsquares','Solver','sparsa','Regularization','lasso'); numNZCoeff = sum(Mdl.Beta~=0);

In the same figure, plot the cross-validated MSE and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale.

figure; [h,hL1,hL2] = plotyy(log10(Lambda),log10(mse),。..log10(Lambda),log10(numNZCoeff)); hL1.Marker ='o'; hL2.Marker ='o'; ylabel(h(1),'log_{10} MSE') ylabel(h(2),'log_{10} nonzero-coefficient frequency') xlabel('log_{10} Lambda') holdoff

Figure contains 2 axes objects. Axes object 1 contains an object of type line. Axes object 2 contains an object of type line.

Choose the index of the regularization strength that balances predictor variable sparsity and low MSE (for example,Lambda(10)).

idxFinal = 10;

Extract the model with corresponding to the minimal MSE.

MdlFinal = selectModels(Mdl,idxFinal)
MdlFinal = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x1 double] Bias: -0.0050 Lambda: 0.0037 Learner: 'leastsquares' Properties, Methods
idxNZCoeff = find(MdlFinal.Beta~=0)
idxNZCoeff =2×1100 200
EstCoeff = Mdl.Beta(idxNZCoeff)
EstCoeff =2×11.0051 1.9965

MdlFinalis aRegressionLinearmodel with one regularization strength. The nonzero coefficientsEstCoeffare close to the coefficients that simulated the data.

Simulate 10 new observations, and predict corresponding responses using the best-performing model.

XNew = sprandn(d,10,nz); YHat = predict(MdlFinal,XNew,'ObservationsIn','columns');

Extended Capabilities

Version History

Introduced in R2016a