Cross-validate machine learning model
sets an additional cross-validation option. You can specify only one name-value argument. For example, you can specify the number of folds or a holdout sample proportion.CVMdl
= crossval(Mdl
,Name,Value
)
Load theionosphere
data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b'
) or good ('g'
).
loadionosphererng(1);% For reproducibility
Train a support vector machine (SVM) classifier. Standardize the predictor data and specify the order of the classes.
SVMModel = fitcsvm(X,Y,'Standardize',true,'ClassNames',{'b','g'});
SVMModel
is a trainedClassificationSVM
classifier.'b'
is the negative class and'g'
is the positive class.
Cross-validate the classifier using 10-fold cross-validation.
CVSVMModel = crossval(SVMModel)
CVSVMModel = ClassificationPartitionedModel CrossValidatedModel: 'SVM' PredictorNames: {1x34 cell} ResponseName: 'Y' NumObservations: 351 KFold: 10 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods
CVSVMModel
is aClassificationPartitionedModel
cross-validated classifier. During cross-validation, the software completes these steps:
Randomly partition the data into 10 sets of equal size.
Train an SVM classifier on nine of the sets.
Repeat steps 1 and 2k= 10 times. The software leaves out one partition each time and trains on the other nine partitions.
Combine generalization statistics for each fold.
Display the first model inCVSVMModel.Trained
.
FirstModel = CVSVMModel.Trained{1}
FirstModel = CompactClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' Alpha: [78x1 double] Bias: -0.2208 KernelParameters: [1x1 struct] Mu: [0.8888 0 0.6320 0.0406 0.5931 0.1205 0.5361 ... ] Sigma: [0.3149 0 0.5033 0.4441 0.5255 0.4663 0.4987 ... ] SupportVectors: [78x34 double] SupportVectorLabels: [78x1 double] Properties, Methods
FirstModel
is the first of the 10 trained classifiers. It is aCompactClassificationSVM
classifier.
You can estimate the generalization error by passingCVSVMModel
tokfoldLoss
.
Specify a holdout sample proportion for cross-validation. By default,crossval
uses 10-fold cross-validation to cross-validate a naive Bayes classifier. However, you have several other options for cross-validation. For example, you can specify a different number of folds or a holdout sample proportion.
Load theionosphere
data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b'
) or good ('g'
).
loadionosphere
Remove the first two predictors for stability.
X = X(:,3:end); rng('default');% For reproducibility
Train a naive Bayes classifier using the predictorsX
and class labelsY
. A recommended practice is to specify the class names.'b'
is the negative class and'g'
is the positive class.fitcnb
assumes that each predictor is conditionally and normally distributed.
Mdl = fitcnb(X,Y,'ClassNames',{'b','g'});
Mdl
is a trainedClassificationNaiveBayes
classifier.
Cross-validate the classifier by specifying a 30% holdout sample.
CVMdl = crossval(Mdl,'Holdout',0.3)
CVMdl = ClassificationPartitionedModel CrossValidatedModel: 'NaiveBayes' PredictorNames: {1x32 cell} ResponseName: 'Y' NumObservations: 351 KFold: 1 Partition: [1x1 cvpartition] ClassNames: {'b' 'g'} ScoreTransform: 'none' Properties, Methods
CVMdl
is aClassificationPartitionedModel
cross-validated, naive Bayes classifier.
Display the properties of the classifier trained using 70% of the data.
TrainedModel = CVMdl.Trained{1}
TrainedModel = CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell} Properties, Methods
TrainedModel
is aCompactClassificationNaiveBayes
classifier.
Estimate the generalization error by passingCVMdl
tokfoldloss
.
kfoldLoss(CVMdl)
ans = 0.2095
The out-of-sample misclassification error is approximately 21%.
减少通过选择fi泛化误差ve most important predictors.
idx = fscmrmr(X,Y); Xnew = X(:,idx(1:5));
Train a naive Bayes classifier for the new predictor.
Mdlnew = fitcnb(Xnew,Y,'ClassNames',{'b','g'});
Cross-validate the new classifier by specifying a 30% holdout sample, and estimate the generalization error.
CVMdlnew = crossval(Mdlnew,'Holdout',0.3); kfoldLoss(CVMdlnew)
ans = 0.1429
The out-of-sample misclassification error is reduced from approximately 21% to approximately 14%.
crossval
Train a regression generalized additive model (GAM) by usingfitrgam
, and create a cross-validated GAM by usingcrossval
and the holdout option. Then, usekfoldPredict
预测反应为validation-fold observations using a model trained on training-fold observations.
Load thepatients
data set.
loadpatients
Create a table that contains the predictor variables (Age
,Diastolic
,Smoker
,Weight
,Gender
,SelfAssessedHealthStatus
) and the response variable (Systolic
).
tbl = table(Age,Diastolic,Smoker,Weight,Gender,SelfAssessedHealthStatus,Systolic);
Train a GAM that contains linear terms for predictors.
Mdl = fitrgam(tbl,'Systolic');
Mdl
is aRegressionGAM
model object.
Cross-validate the model by specifying a 30% holdout sample.
rng('default')% For reproducibilityCVMdl = crossval(Mdl,'Holdout',0.3)
CVMdl = RegressionPartitionedGAM CrossValidatedModel: 'GAM' PredictorNames: {1x6 cell} CategoricalPredictors: [3 5 6] ResponseName: 'Systolic' NumObservations: 100 KFold: 1 Partition: [1x1 cvpartition] NumTrainedPerFold: [1x1 struct] ResponseTransform: 'none' IsStandardDeviationFit: 0 Properties, Methods
Thecrossval
function creates aRegressionPartitionedGAM
model objectCVMdl
with the holdout option. During cross-validation, the software completes these steps:
Randomly select and reserve 30% of the data as validation data, and train the model using the rest of the data.
Store the compact, trained model in theTrained
property of the cross-validated model objectRegressionPartitionedGAM
.
You can choose a different cross-validation setting by using the'CrossVal'
,'CVPartition'
,'KFold'
, or'Leaveout'
name-value argument.
Predict responses for the validation-fold observations by usingkfoldPredict
. The function predicts responses for the validation-fold observations by using the model trained on the training-fold observations. The function assignsNaN
to the training-fold observations.
yFit = kfoldPredict(CVMdl);
Find the validation-fold observation indexes, and create a table containing the observation index, observed response values, and predicted response values. Display the first eight rows of the table.
idx = find(~isnan(yFit)); t = table(idx,tbl.Systolic(idx),yFit(idx),...'VariableNames',{'Obseraction Index','Observed Value','Predicted Value'}); head(t)
ans=8×3 tableObseraction Index Observed Value Predicted Value _________________ ______________ _______________ 1 124 130.22 6 121 124.38 7 130 125.26 12 115 117.05 20 125 121.82 22 123 116.99 23 114 107 24 128 122.52
Compute the regression error (mean squared error) for the validation-fold observations.
L = kfoldLoss(CVMdl)
L = 43.8715
Mdl
—Machine learning modelMachine learning model, specified as a full regression or classification model object, as given in the following tables of supported models.
Regression Model Object
Model | Full Regression Model Object |
---|---|
Gaussian process regression (GPR) model | RegressionGP (If you supply a custom'ActiveSet' in the call tofitrgp , then you cannot cross-validate the GPR model.) |
Generalized additive model (GAM) | RegressionGAM |
Neural network model | RegressionNeuralNetwork |
Classification Model Object
Model | Full Classification Model Object |
---|---|
Generalized additive model | ClassificationGAM |
k-nearest neighbor model | ClassificationKNN |
Naive Bayes model | ClassificationNaiveBayes |
Neural network model | ClassificationNeuralNetwork |
Support vector machine for one-class and binary classification | ClassificationSVM |
Specify optional comma-separated pairs ofName,Value
arguments.Name
is the argument name andValue
is the corresponding value.Name
must appear inside quotes. You can specify several name and value pair arguments in any order asName1,Value1,...,NameN,ValueN
.
crossval(Mdl,'KFold',3)
specifies using three folds in a cross-validated model.
CVPartition
—Cross-validation partition[]
(default) |cvpartition
partition objectCross-validation partition, specified as acvpartition
partition object created bycvpartition
. The partition object specifies the type of cross-validation and the indexing for the training and validation sets.
You can specify only one of these four name-value arguments:'CVPartition'
,'Holdout'
,'KFold'
, or'Leaveout'
.
Example:Suppose you create a random partition for 5-fold cross-validation on 500 observations by usingcvp = cvpartition(500,'KFold',5)
. Then, you can specify the cross-validated model by using'CVPartition',cvp
.
Holdout
—Fraction of data for holdout validationFraction of the data used for holdout validation, specified as a scalar value in the range (0,1). If you specify'Holdout',p
, then the software completes these steps:
Randomly select and reservep*100
% of the data as validation data, and train the model using the rest of the data.
Store the compact, trained model in theTrained
property of the cross-validated model. IfMdl
does not have a corresponding compact object, thenTrained
contains a full object.
You can specify only one of these four name-value arguments:'CVPartition'
,'Holdout'
,'KFold'
, or'Leaveout'
.
Example:'Holdout',0.1
Data Types:double
|single
KFold
—Number of folds10
(default) |positive integer value greater than 1Number of folds to use in a cross-validated model, specified as a positive integer value greater than 1. If you specify'KFold',k
, then the software completes these steps:
Randomly partition the data intok
sets.
For each set, reserve the set as validation data, and train the model using the otherk
– 1sets.
Store thek
compact, trained models in ak
-by-1 cell vector in theTrained
property of the cross-validated model. IfMdl
does not have a corresponding compact object, thenTrained
contains a full object.
You can specify only one of these four name-value arguments:'CVPartition'
,'Holdout'
,'KFold'
, or'Leaveout'
.
Example:'KFold',5
Data Types:single
|double
Leaveout
—Leave-one-out cross-validation flag'off'
(default) |'on'
Leave-one-out cross-validation flag, specified as'on'
or'off'
. If you specify'Leaveout','on'
, then for each of thenobservations (wherenis the number of observations, excluding missing observations, specified in theNumObservations
property of the model), the software completes these steps:
Reserve the one observation as validation data, and train the model using the othern– 1 observations.
Store thencompact, trained models in ann-by-1 cell vector in theTrained
property of the cross-validated model. IfMdl
does not have a corresponding compact object, thenTrained
contains a full object.
You can specify only one of these four name-value arguments:'CVPartition'
,'Holdout'
,'KFold'
, or'Leaveout'
.
Example:'Leaveout','on'
CVMdl
— Cross-validated machine learning modelCross-validated machine learning model, returned as one of the cross-validated (partitioned) model objects in the following tables, depending on the input modelMdl
.
Regression Model Object
Model | Regression Model (Mdl ) |
Cross-Validated Model (CVMdl ) |
---|---|---|
Gaussian process regression model | RegressionGP |
RegressionPartitionedModel |
Generalized additive model | RegressionGAM |
RegressionPartitionedGAM |
Neural network model | RegressionNeuralNetwork |
RegressionPartitionedModel |
Classification Model Object
Model | Classification Model (Mdl ) |
Cross-Validated Model (CVMdl ) |
---|---|---|
Generalized additive model | ClassificationGAM |
ClassificationPartitionedGAM |
k-nearest neighbor model | ClassificationKNN |
ClassificationPartitionedModel |
Naive Bayes model | ClassificationNaiveBayes |
ClassificationPartitionedModel |
Neural network model | ClassificationNeuralNetwork |
ClassificationPartitionedModel |
Support vector machine for one-class and binary classification | ClassificationSVM |
ClassificationPartitionedModel |
Assess the predictive performance ofMdl
on cross-validated data by using thekfoldfunctions and properties ofCVMdl
, such askfoldPredict
,kfoldLoss
,kfoldMargin
, andkfoldEdge
for classification andkfoldPredict
andkfoldLoss
for regression.
Return a partitioned classifier with stratified partitioning by using the name-value argument'KFold'
or'Holdout'
.
Create acvpartition
objectcvp
usingcvp =
cvpartition
(n,'KFold',k)
. Return a partitioned classifier with nonstratified partitioning by using the name-value argument'CVPartition',cvp
.
Instead of training a model and then cross-validating it, you can create a cross-validated model directly by using a fitting function and specifying one of these name-value argument:'CrossVal'
,'CVPartition'
,'Holdout'
,'Leaveout'
, or'KFold'
.
Usage notes and limitations:
This function fully supports GPU arrays for a trained classification model specified as aClassificationKNN
orClassificationSVM
object.
For more information, seeRun MATLAB Functions on a GPU(Parallel Computing Toolbox).
You have a modified version of this example. Do you want to open this example with your edits?
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:.
Selectweb siteYou can also select a web site from the following list:
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.