predict

Predict response of linear regression model

expand all in page

Syntax

YHat = predict(Mdl,X)

YHat = predict(Mdl,X,'ObservationsIn',dimension)

Description

YHat= predict(Mdl,X)returns predicted responses for each observation in the predictor dataXbased on the trained linear regression modelMdl。YHatcontains responses for each regularization strength inMdl。

example

YHat= predict(Mdl,X,'ObservationsIn',dimension)specifies the predictor data observation dimension, either'rows'(default) or'columns'。For example, specify'ObservationsIn','columns'to indicate that columns in the predictor data correspond to observations.

Input Arguments

expand all

`Mdl`—Linear regression model
`RegressionLinear`model object

Linear regression model, specified as aRegressionLinearmodel object. You can create aRegressionLinearmodel object usingfitrlinear。

`X`—Predictor data used to generate responses
full numeric matrix|sparse numeric matrix|table

Predictor data used to generate responses, specified as a full or sparse numeric matrix or a table.

By default, each row ofXcorresponds to one observation, and each column corresponds to one variable.

For a numeric matrix:
- The variables in the columns ofXmust have the same order as the predictor variables that trainedMdl。
- If you trainMdlusing a table (for example,Tbl) andTblcontains only numeric predictor variables, thenXcan be a numeric matrix. To treat numeric predictors inTblas categorical during training, identify categorical predictors by using theCategoricalPredictors名称-值对argument offitrlinear。IfTblcontains heterogeneous predictor variables (for example, numeric and categorical data types) andXis a numeric matrix, thenpredictthrows an error.
For a table:
- predictdoes not support multicolumn variables or cell arrays other than cell arrays of character vectors.
- If you trainMdlusing a table (for example,Tbl), then all predictor variables inXmust have the same variable names and data types as the variables that trainedMdl(stored inMdl.PredictorNames). However, the column order ofXdoes not need to correspond to the column order ofTbl。Also,TblandXcan contain additional variables (response variables, observation weights, and so on), butpredictignores them.
- If you trainMdlusing a numeric matrix, then the predictor names inMdl.PredictorNamesmust be the same as the corresponding predictor variable names inX。To specify predictor names during training, use thePredictorNames名称-值对argument offitrlinear。所有的预测变量Xmust be numeric vectors.Xcan contain additional variables (response variables, observation weights, and so on), butpredictignores them.

Note

If you orient your predictor matrix so that observations correspond to columns and specify'ObservationsIn','columns', then you might experience a significant reduction in optimization execution time. You cannot specify'ObservationsIn','columns'for predictor data in a table.

Data Types:double|single|table

`dimension`—Predictor data observation dimension
`'rows'`(default) |`'columns'`

Predictor data observation dimension, specified as'columns'or'rows'。

Note

Output Arguments

expand all

`YHat`— Predicted responses
numeric matrix

Predicted responses, returned as an-by-Lnumeric matrix.nis the number of observations inXandLis the number of regularization strengths inMdl.Lambda。YHat(i,j)is the response for observationiusing the linear regression model that has regularization strengthMdl.Lambda(j)。

The predicted response using the model with regularization strengthjis ${\hat{y}}_{j} = x β_{j} + b_{j} 。$

xis an observation from the predictor data matrixX, and is row vector.
$β_{j}$ is the estimated column vector of coefficients. The software stores this vector inMdl.Beta(:,j)。
$b_{j}$ is the estimated, scalar bias, which the software stores inMdl.Bias(j)。

Examples

expand all

Predict Test-Sample Responses

Open Live Script

Simulate 10000 observations from this model

$y = x_{100} + 2 x_{200} + e 。$

$X = x_{1}, 。。。, x_{1000}$ is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.
eis random normal error with mean 0 and standard deviation 0.3.

rng(1)% For reproducibilityn = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);

Train a linear regression model. Reserve 30% of the observations as a holdout sample.

CVMdl = fitrlinear(X,Y,'Holdout',0.3); Mdl = CVMdl.Trained{1}

Mdl = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x1 double] Bias: -0.0066 Lambda: 1.4286e-04 Learner: 'svm' Properties, Methods

CVMdlis aRegressionPartitionedLinearmodel. It contains the propertyTrained, which is a 1-by-1 cell array holding aRegressionLinearmodel that the software trained using the training set.

Extract the training and test data from the partition definition.

trainIdx = training(CVMdl.Partition); testIdx = test(CVMdl.Partition);

Predict the training- and test-sample responses.

yHatTrain = predict(Mdl,X(trainIdx,:)); yHatTest = predict(Mdl,X(testIdx,:));

因为there is one regularization strength inMdl,yHatTrainandyHatTestare numeric vectors.

Predict from Best-Performing Model

Open Live Script

Predict responses from the best-performing, linear regression model that uses a lasso-penalty and least squares.

Simulate 10000 observations as inPredict Test-Sample Responses。

rng(1)% For reproducibilityn = 1e4; d = 1e3; nz = 0.1; X = sprandn(n,d,nz); Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);

Create a set of 15 logarithmically-spaced regularization strengths from $1 0^{- 5}$ through $1 0^{- 1}$ 。

Lambda = logspace(-5,-1,15);

Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Optimize the objective function using SpaRSA.

X = X'; CVMdl = fitrlinear(X,Y,'ObservationsIn','columns','KFold',5,'Lambda',Lambda,。..'Learner','leastsquares','Solver','sparsa','Regularization','lasso'); numCLModels = numel(CVMdl.Trained)

numCLModels = 5

CVMdlis aRegressionPartitionedLinearmodel. Becausefitrlinearimplements 5-fold cross-validation,CVMdlcontains 5RegressionLinearmodels that the software trains on each fold.

Display the first trained linear regression model.

Mdl1 = CVMdl.Trained{1}

Mdl1 = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x15 double] Bias: [-0.0049 -0.0049 -0.0049 -0.0049 -0.0049 -0.0048 ... ] Lambda: [1.0000e-05 1.9307e-05 3.7276e-05 7.1969e-05 ... ] Learner: 'leastsquares' Properties, Methods

Mdl1is aRegressionLinearmodel object.fitrlinearconstructedMdl1by training on the first four folds. BecauseLambdais a sequence of regularization strengths, you can think ofMdl1as 11 models, one for each regularization strength inLambda。

Estimate the cross-validated MSE.

mse = kfoldLoss(CVMdl);

Higher values ofLambdalead to predictor variable sparsity, which is a good quality of a regression model. For each regularization strength, train a linear regression model using the entire data set and the same options as when you cross-validated the models. Determine the number of nonzero coefficients per model.

Mdl = fitrlinear(X,Y,'ObservationsIn','columns','Lambda',Lambda,。..'Learner','leastsquares','Solver','sparsa','Regularization','lasso'); numNZCoeff = sum(Mdl.Beta~=0);

In the same figure, plot the cross-validated MSE and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale.

figure; [h,hL1,hL2] = plotyy(log10(Lambda),log10(mse),。..log10(Lambda),log10(numNZCoeff)); hL1.Marker ='o'; hL2.Marker ='o'; ylabel(h(1),'log_{10} MSE') ylabel(h(2),'log_{10} nonzero-coefficient frequency') xlabel('log_{10} Lambda') holdoff

Figure contains 2 axes objects. Axes object 1 contains an object of type line. Axes object 2 contains an object of type line.

Choose the index of the regularization strength that balances predictor variable sparsity and low MSE (for example,Lambda(10)).

idxFinal = 10;

Extract the model with corresponding to the minimal MSE.

MdlFinal = selectModels(Mdl,idxFinal)

MdlFinal = RegressionLinear ResponseName: 'Y' ResponseTransform: 'none' Beta: [1000x1 double] Bias: -0.0050 Lambda: 0.0037 Learner: 'leastsquares' Properties, Methods

idxNZCoeff = find(MdlFinal.Beta~=0)

idxNZCoeff =2×1100 200

EstCoeff = Mdl.Beta(idxNZCoeff)

EstCoeff =2×11.0051 1.9965

MdlFinalis aRegressionLinearmodel with one regularization strength. The nonzero coefficientsEstCoeffare close to the coefficients that simulated the data.

Simulate 10 new observations, and predict corresponding responses using the best-performing model.

XNew = sprandn(d,10,nz); YHat = predict(MdlFinal,XNew,'ObservationsIn','columns');

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

Usage notes and limitations:

predictdoes not support talltabledata.

For more information, seeTall Arrays。

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

您可以生成C / c++代码predictandupdateby using a coder configurer. Or, generate code only forpredictby usingsaveLearnerForCoder,loadLearnerForCoder, andcodegen。
- Code generation forpredictandupdate— Create a coder configurer by usinglearnerCoderConfigurerand then generate code by usinggenerateCode。Then you can update model parameters in the generated code without having to regenerate the code.
- Code generation forpredict— Save a trained model by usingsaveLearnerForCoder。Define an entry-point function that loads the saved model by usingloadLearnerForCoderand calls thepredictfunction. Then usecodegen(MATLAB Coder)to generate code for the entry-point function.

To generate single-precision C/C++ code forpredict, specify the name-value argument"DataType","single"when you call theloadLearnerForCoderfunction.

This table contains notes about the arguments ofpredict。Arguments not included in this table are fully supported.

Argument Notes and Limitations

Argument	Notes and Limitations
`Mdl`	For the usage notes and limitations of the model object, seeCode Generationof the`RegressionLinear`object.
`X`	For general code generation,`X`must be a single-precision or double-precision matrix or a table containing numeric variables, categorical variables, or both. In the coder configurer workflow,`X`must be a single-precision or double-precision matrix. The number of observations in`X`can be a variable size, but the number of variables in`X`must be fixed. If you want to specify`X`as a table, then your model must be trained using a table, and your entry-point function for prediction must do the following: Accept data as arrays. Create a table from the data input arguments and specify the variable names in the table. Pass the table to`predict`。 For an example of this table workflow, seeGenerate Code to Classify Data in Table。For more information on using tables in code generation, seeCode Generation for Tables(MATLAB Coder)andTable Limitations for Code Generation(MATLAB Coder)。
Name-value pair arguments	Names in name-value pair arguments must be compile-time constants. The value for the`'ObservationsIn'`名称-值对argument must be a compile-time constant. For example, to use the`'ObservationsIn','columns'`名称-值对argument in the generated code, include`{coder.Constant('ObservationsIn'),coder.Constant('columns')}`in the`-args`value of`codegen`(MATLAB Coder)。

Mdl

For the usage notes and limitations of the model object, seeCode Generationof theRegressionLinearobject.

X

For general code generation,Xmust be a single-precision or double-precision matrix or a table containing numeric variables, categorical variables, or both.
In the coder configurer workflow,Xmust be a single-precision or double-precision matrix.
The number of observations inXcan be a variable size, but the number of variables inXmust be fixed.
If you want to specifyXas a table, then your model must be trained using a table, and your entry-point function for prediction must do the following:
- Accept data as arrays.
- Create a table from the data input arguments and specify the variable names in the table.
- Pass the table topredict。
For an example of this table workflow, seeGenerate Code to Classify Data in Table。For more information on using tables in code generation, seeCode Generation for Tables(MATLAB Coder)andTable Limitations for Code Generation(MATLAB Coder)。

Name-value pair arguments

Names in name-value pair arguments must be compile-time constants.
The value for the'ObservationsIn'名称-值对argument must be a compile-time constant. For example, to use the'ObservationsIn','columns'名称-值对argument in the generated code, include{coder.Constant('ObservationsIn'),coder.Constant('columns')}in the-argsvalue ofcodegen(MATLAB Coder)。

For more information, seeIntroduction to Code Generation。

Version History

Introduced in R2016a

predict

Syntax

Description

Input Arguments

Mdl—Linear regression modelRegressionLinearmodel object

X—Predictor data used to generate responsesfull numeric matrix|sparse numeric matrix|table

dimension—Predictor data observation dimension'rows'(default) |'columns'

Output Arguments

YHat— Predicted responsesnumeric matrix

Examples

Predict Test-Sample Responses

Predict from Best-Performing Model

Extended Capabilities

Tall ArraysCalculate with arrays that have more rows than fit in memory.

C/C++ Code GenerationGenerate C and C++ code using MATLAB® Coder™.

Version History

See Also

`Mdl`—Linear regression model
`RegressionLinear`model object

`X`—Predictor data used to generate responses
full numeric matrix|sparse numeric matrix|table

`dimension`—Predictor data observation dimension
`'rows'`(default) |`'columns'`

`YHat`— Predicted responses
numeric matrix

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.