Main Content

crossval

Cross-validate machine learning model

    Description

    example

    CVMdl= crossVal(Mdl)returns a cross-validated (partitioned) machine learning model (CVMdl) from a trained model (Mdl). By default,crossval在训练数据上使用10倍的交叉验证。

    CVMdl= crossVal(Mdl,姓名,Value)sets an additional cross-validation option. You can specify only one name-value argument. For example, you can specify the number of folds or a holdout sample proportion.

    Examples

    collapse all

    Load the电离层数据集。该数据集有34个预测指标和351个二进制响应,用于雷达回报,要么不好(要么'b') or good ('G').

    load电离层rng(1);% For reproducibility

    Train a support vector machine (SVM) classifier. Standardize the predictor data and specify the order of the classes.

    svmmodel= fitcsvm(X,Y,'Standardize',真的,'ClassNames',{'b','G'});

    svmmodel是训练有素的分类SVM分类器。'b'is the negative class and'G'is the positive class.

    Cross-validate the classifier using 10-fold cross-validation.

    CVSVMModel = crossval(SVMModel)
    cvsvmmodel = classification partitionedmodel crossValidatedModel:'svm'prediactOrnames:{1x34 cell}响应式:'y'numobServations:351 kfold:10分区:[10分区:[1x1 cvpartition] classNames:classNames:classNAMES:{'b'b''g''g'g'g'g'g'g coreTransforts:

    CVSVMModel是一个分类PartitionedModelcross-validated classifier. During cross-validation, the software completes these steps:

    1. 将数据随机分配为10组相等的大小。

    2. Train an SVM classifier on nine of the sets.

    3. 重复步骤1和2k= 10 times. The software leaves out one partition each time and trains on the other nine partitions.

    4. Combine generalization statistics for each fold.

    Display the first model incvsvmmodel。训练

    firstModel = cvsvmmodel。训练{1}
    FirstModel= CompactClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' Alpha: [78x1 double] Bias: -0.2209 KernelParameters: [1x1 struct] Mu: [0.8888 0 0.6320 0.0406 0.5931 0.1205 0.5361 ... ] Sigma: [0.3149 0 0.5033 0.4441 0.5255 0.4663 0.4987 ... ] SupportVectors: [78x34 double] SupportVectorLabels: [78x1 double] Properties, Methods

    FirstModelis the first of the 10 trained classifiers. It is aCompactClassificationSVM分类器。

    您可以通过通过CVSVMModelkfoldloss

    指定交叉验证的保留样品比例。默认,crossval使用10倍的交叉验证来交叉验证天真的贝叶斯分类器。但是,您还有其他几种交叉验证选择。例如,您可以指定不同数量的折叠或保留样本比例。

    Load the电离层数据集。该数据集有34个预测指标和351个二进制响应,用于雷达回报,要么不好(要么'b') or good ('G').

    load电离层

    删除前两个预测因素以进行稳定。

    x = x(:,3:end);rng('default');% For reproducibility

    Train a naive Bayes classifier using the predictorsX和类标签Y。A recommended practice is to specify the class names.'b'is the negative class and'G'is the positive class.fitcnbassumes that each predictor is conditionally and normally distributed.

    mdl = fitcnb(x,y,'ClassNames',{'b','G'});

    Mdl是训练有素的分类分类器。

    旨在分类器通过指定一个30%holdout sample.

    CVMdl = crossval(Mdl,'坚持',0.3)
    cvmdl =分类partitionedModel crossValidatedModel:'naiveBayes'预测索引:{1x32 cell}响应e anthespEname:'y'numobServations:351 kfold:1 partition:1 partition:[1x1 cvpartition] classNames:classNames:classNAMES:{'b'b''g''g'g'g'g'g'g of scoretransforts:

    CVMdl是一个分类PartitionedModelcross-validated, naive Bayes classifier.

    Display the properties of the classifier trained using 70% of the data.

    训练的模型= CVMdl.Trained{1}
    训练的模型= CompactClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'b' 'g'} ScoreTransform: 'none' DistributionNames: {1x32 cell} DistributionParameters: {2x32 cell} Properties, Methods

    训练的模型是一个紧凑型班级分类器。

    Estimate the generalization error by passingCVMdlkfoldloss

    kfoldloss(CVMDL)
    ANS = 0.2095

    The out-of-sample misclassification error is approximately 21%.

    Reduce the generalization error by choosing the five most important predictors.

    idx = fscmrmr(X,Y); Xnew = X(:,idx(1:5));

    Train a naive Bayes classifier for the new predictor.

    Mdlnew = fitcnb(Xnew,Y,'ClassNames',{'b','G'});

    Cross-validate the new classifier by specifying a 30% holdout sample, and estimate the generalization error.

    CVMdlnew = crossval(Mdlnew,'坚持',0.3);kfoldloss(cvmdlnew)
    ans = 0.1429

    样本外的错误分类误差从大约21%降低到约14%。

    Train a regression generalized additive model (GAM) by usingfitrgam,并通过使用crossval和the holdout option. Then, usekfoldpredict预测反应为validation-fold observations using a model trained on training-fold observations.

    Load thepatients数据集。

    loadpatients

    创建一个table that contains the predictor variables (Age,Diastolic,吸烟者,Weight,Gender,SelfAssessedHealthStatus) and the response variable (收缩期).

    tbl = table(Age,Diastolic,Smoker,Weight,Gender,SelfAssessedHealthStatus,Systolic);

    Train a GAM that contains linear terms for predictors.

    Mdl = fitrgam(tbl,“收缩期”);

    Mdl是一个RegressionGAM模型对象。

    Cross-validate the model by specifying a 30% holdout sample.

    rng('default')% For reproducibilityCVMdl = crossval(Mdl,'坚持',0.3)
    CVMdl = RegressionPartitionedGAM CrossValidatedModel: 'GAM' PredictorNames: {1x6 cell} CategoricalPredictors: [3 5 6] ResponseName: 'Systolic' NumObservations: 100 KFold: 1 Partition: [1x1 cvpartition] NumTrainedPerFold: [1x1 struct] ResponseTransform: 'none' IsStandardDeviationFit: 0 Properties, Methods

    Thecrossval函数创建a回归专业model objectCVMdl使用保留选项。在交叉验证期间,该软件完成了以下步骤:

    1. Randomly select and reserve 30% of the data as validation data, and train the model using the rest of the data.

    2. Store the compact, trained model in theTrained交叉验证模型对象的属性回归专业

    You can choose a different cross-validation setting by using the'CrossVal','CVPartition','kfold', 或者'Leaveout'name-value argument.

    通过使用kfoldpredict。The function predicts responses for the validation-fold observations by using the model trained on the training-fold observations. The function assignsNaN至the training-fold observations.

    yFit = kfoldPredict(CVMdl);

    Find the validation-fold observation indexes, and create a table containing the observation index, observed response values, and predicted response values. Display the first eight rows of the table.

    idx = find(〜isnan(yfit));t = table(idx,tbl.systolic(idx),yfit(idx),...'variablenames',{'Obseraction Index',“观察到的价值”,'Predicted Value'});头(T)
    ans=8×3 tableObseraction Index Observed Value Predicted Value _________________ ______________ _______________ 1 124 130.22 6 121 124.38 7 130 125.26 12 115 117.05 20 125 121.82 22 123 116.99 23 114 107 24 128 122.52

    计算验证折叠观测值的回归误差(平方平方误差)。

    L = kfoldLoss(CVMdl)
    L = 43.8715

    Input Arguments

    collapse all

    机器学习模型,指定为完整的回归或分类模型对象,如以下支持模型表所示。金宝app

    回归模型对象

    Model Full Regression Model Object
    Gaussian process regression (GPR) model RegressionGP(如果您提供自定义'ActiveSet'in the call tofitrgp, then you cannot cross-validate the GPR model.)
    Generalized additive model (GAM) RegressionGAM
    Neural network model RegressionNeuralNetwork

    分类模型对象

    Model Full Classification Model Object
    Generalized additive model 分类GAM
    k- 最近的邻居模型 分类知识
    Naive Bayes model 分类
    Neural network model 分类NeuralNetwork
    金宝app支持向量机进行一级和二进制分类 分类SVM

    姓名-Value Arguments

    将可选的参数对name1 = value1,...,namen = valuen, 在哪里姓名is the argument name and价值is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    在R2021a之前,请使用逗号分隔每个名称和值,并附上姓名用引号。

    Example:crossval(Mdl,'KFold',3)specifies using three folds in a cross-validated model.

    交叉验证分区,指定为CVPartitionpartition object created byCVPartition。分区对象指定了交叉验证的类型以及培训和验证集的索引。

    You can specify only one of these four name-value arguments:'CVPartition','坚持','kfold', 或者'Leaveout'

    Example:Suppose you create a random partition for 5-fold cross-validation on 500 observations by usingCVP =CVPartition(500,'KFold',5)。然后,您可以使用“ CVPartition”,CVP

    Fraction of the data used for holdout validation, specified as a scalar value in the range (0,1). If you specify'坚持',p, then the software completes these steps:

    1. Randomly select and reservep*100数据的%为验证数据,并使用其余数据训练模型。

    2. Store the compact, trained model in theTrained交叉验证模型的属性。如果Mdl没有相应的紧凑对象,然后Trainedcontains a full object.

    You can specify only one of these four name-value arguments:'CVPartition','坚持','kfold', 或者'Leaveout'

    Example:'坚持',0.1

    数据类型:double|单身的

    折叠数至use in a cross-validated model, specified as a positive integer value greater than 1. If you specify'kfold',k, then the software completes these steps:

    1. Randomly partition the data intoksets.

    2. For each set, reserve the set as validation data, and train the model using the otherk– 1sets.

    3. Store thekcompact, trained models in ak-1 by-1细胞向量Trained交叉验证模型的属性。如果Mdl没有相应的紧凑对象,然后Trainedcontains a full object.

    You can specify only one of these four name-value arguments:'CVPartition','坚持','kfold', 或者'Leaveout'

    Example:'kfold',5

    数据类型:单身的|double

    一对一的交叉验证旗, specified as'on'或者'off'。如果you specify“离开”,',然后每个n观察(其中nis the number of observations, excluding missing observations, specified in theNumObservations该模型的属性),该软件完成了以下步骤:

    1. 保留一个观察结果作为验证数据,并使用另一个观察训练模型n- 1个观察。

    2. Store thencompact, trained models in ann-1 by-1细胞向量Trained交叉验证模型的属性。如果Mdl没有相应的紧凑对象,然后Trainedcontains a full object.

    You can specify only one of these four name-value arguments:'CVPartition','坚持','kfold', 或者'Leaveout'

    Example:“离开”,'

    输出参数

    collapse all

    Cross-validated machine learning model, returned as one of the cross-validated (partitioned) model objects in the following tables, depending on the input modelMdl

    回归模型对象

    Model Regression Model (Mdl) 交叉验证模型(CVMdl)
    Gaussian process regression model RegressionGP RegressionPartitionedModel
    Generalized additive model RegressionGAM 回归专业
    Neural network model RegressionNeuralNetwork RegressionPartitionedModel

    分类模型对象

    Model 分类模型(Mdl) 交叉验证模型(CVMdl)
    Generalized additive model 分类GAM 分类PartitionedGAM
    k- 最近的邻居模型 分类知识 分类PartitionedModel
    Naive Bayes model 分类 分类PartitionedModel
    Neural network model 分类NeuralNetwork 分类PartitionedModel
    金宝app支持向量机进行一级和二进制分类 分类SVM 分类PartitionedModel

    Tips

    • 评估预测性能Mdlon cross-validated data by using thekfold功能和特性CVMdl, such askfoldpredict,kfoldloss,kfoldMargin, andkfoldEdgefor classification andkfoldpredictkfoldlossfor regression.

    • Return a partitioned classifier with stratified partitioning by using the name-value argument'kfold'或者'坚持'

    • 创建一个CVPartitionobjectCVP使用CVP =CVPartition(n,'kfold',k)。通过使用名称值参数返回带有非启示分区的分区分类器“ CVPartition”,CVP

    Alternative Functionality

    Instead of training a model and then cross-validating it, you can create a cross-validated model directly by using a fitting function and specifying one of these name-value argument:'CrossVal','CVPartition','坚持','Leaveout', 或者'kfold'

    扩展功能

    Version History

    Introduced in R2012a