文档

杂交

使用交叉验证的损失估算

句法

vals = crossval(fun,X)
vals = crossval(fun,X,Y,...)
MSE = CrossVal('Mse',X,Y,'Predfun',Predfun)
mcr = crossval('mcr',X,y,'Predfun',predfun)
val = crossval(标准,,,,X1,,,,X2,...,y,'Predfun',predfun)
vals = crossval(...,'姓名',,value

Description

vals = crossval(fun,X)对该功能执行10倍的交叉验证fun,应用于数据X

funis a function handle to a function with two inputs, the training subset ofX,,,,Xtrain,以及测试子集X,,,,XTEST,,,,as follows:

测试值= fun(XTRAIN,XTEST)

Each time it is called,funshould useXtrain要适合模型,然后返回一些标准测试值Computed onXTEST使用该拟合模型。

X可以是列矢量或矩阵。行XCorrespond to observations; columns correspond to variables or features. Each row ofvalsContains the result of applyingfunto one test set. If测试值is a non-scalar value,杂交使用线性索引将其转换为行矢量,并存储在一排vals

vals = crossval(fun,X,Y,...)is used when data are stored in separate variablesX,,,,y,...。所有变量(列向量,矩阵或数组)必须具有相同数量的行。fun被称为训练子集X,,,,y,...,然后是测试子集X,,,,y,,,,。。。,,,,as follows:

testvals = fun(xtrain,ytrain,...,xtest,ytest,...)

MSE = CrossVal('Mse',X,Y,'Predfun',Predfun)返回mse,,,,a scalar containing a 10-fold cross validation estimate of mean-squared error for the functionpreatfunX可以是列矢量,矩阵或预测变量。y是响应值的列向量。Xandy必须具有相同数量的行。

preatfun是一个函数句柄,并带有训练子集X,训练子集y,以及测试子集Xas follows:

是的= predfun(XTRAIN,ytrain,XTEST)

Each time it is called,preatfunshould useXtrainandytrain为了拟合回归模型,然后返回列矢量中的拟合值是的。Each row of是的Contains the predicted values for the corresponding row ofXTEST杂交计算平方错误是的and the corresponding response test set, and returns the overall mean across all test sets.

mcr = crossval('mcr',X,y,'Predfun',predfun)返回mcr,,,,a scalar containing a 10-fold cross validation estimate of misclassification rate (the proportion of misclassified samples) for the functionpreatfun。The matrixXContains predictor values and the vectoryContains class labels.preatfunshould useXtrainandyTRAIN适合分类模型并返回是的作为预测的班级标签XTEST杂交Computes the number of misclassifications between是的以及相应的响应测试集,并返回所有测试集的总体错误分类率。

val = crossval(标准,,,,X1,,,,X2,...,y,'Predfun',predfun),,,,where标准is'mse'or'MCR',返回均方误差(用于回归模型)或错误分类率(对于分类模型)的交叉验证估计值,该估计值具有预测值x1,,,,x2,...和分别是响应值或类标签yx1,,,,x2,,,,。。。andy必须具有相同数量的行。preatfunis a function handle called with the training subsets ofx1,,,,x2,...,训练子集y,,,,and the test subsets ofx1,,,,x2,,,,。。。,,,,as follows:

yfit = preatfun(x1train,x2train,...,ytrain,x1test,x2test,...)

是的should be a column vector containing the fitted values.

vals = crossval(...,'姓名',,value从下表指定一个或多个可选的参数名称/值对。指定姓名内部引号。

姓名 Value
holdout

A scalar specifying the ratio or the number of observationsp用于保留交叉验证。什么时候0<p<1, 大约p*n为测试集是随机selecte观测d. Whenp是一个整数,p为测试集是随机selecte观测d.

kfold

一个大于1指定折叠数的正整数k为了k- 折叠式验证。

leaveout

Specifies leave-one-out cross validation. The value must be1

mcreps

一个积极的整数,指定蒙特卡洛重复的数量进行验证。如果第一个输入杂交is'mse'or'MCR',,,,杂交返回the mean of mean-squared error or misclassification rate across all of the Monte-Carlo repetitions. Otherwise,杂交串联值valsfrom all of the Monte-Carlo repetitions along the first dimension.

分割

An objectCof theCVPARTITION类,指定交叉验证类型和分区。

分层

A column vectorgroup指定分层的组。培训和测试组的班级比例大致相同groups or empty character vectors ingroup被视为缺失值,并且数据的相应行被忽略。

options

A structure that specifies whether to run in parallel, and specifies the random stream or streams. Create theoptions结构statset。选项字段:

  • UseParallel— Set totrueto compute in parallel. Default is错误的

  • useubstreams— Set totrue以可重复的方式并行计算。默认为错误的。要重复计算,设置Streamsto a type allowing substreams:'MLFG6331_64'or'MRG32K3A'

  • Streams— ARandstream对象或单元格数组由一个这样的对象组成。如果您不指定Streams,,,,杂交使用默认流。

只有一个kfold,,,,holdout,,,,leaveout, 或者分割Can be specified, and分割Cannot be specified with分层。如果两者分割andmcrepsare specified, the first Monte-Carlo repetition uses the partition information in theCVPARTITION对象和repartition调用方法来生成其余重复的新分区。如果未指定交叉验证类型,则默认值为10倍交叉验证。

    笔记:什么时候using cross validation with classification algorithms, stratification is preferred. Otherwise, some test sets may not include observations from all classes.

Examples

Example 1

计算使用10倍交叉验证回归的于点误差:

负载('Fisheriris');y = meas(:,1);x = [hons(size(y,1),1),mes(:,2:4)];regf =@(xtrain,ytrain,xtest)(xtest*recress(ytrain,xtrain));CVMSE = CrossVal('MSE',X,Y,'Predfun',regf)CVMSE = 0.1015

Example 2

使用分层的10倍交叉验证计算错误分类率:

负载('Fisheriris');y =物种;x = meas;cp = cvPartition(y,'k',10);%分层的交叉验证classf = @(xtrain,ytrain,xtest)(分类(Xtest,Xtrain,... ytrain));cvmcr = crossVal('mcr',x,y,'prepfun',classf,'partition',cp)cvmcr = 0.0200

Example 3

Compute the confusion matrix using stratified 10-fold cross validation:

负载('Fisheriris');y =物种;x = meas;order = unique(y); % Order of the group labels cp = cvpartition(y,'k',10); % Stratified cross-validation f = @(xtr,ytr,xte,yte)confusionmat(yte,... classify(xte,xtr,ytr),'order',order); cfMat = crossval(f,X,y,'partition',cp); cfMat = reshape(sum(cfMat),3,3) cfMat = 50 0 0 0 48 2 0 1 49

CFMAT是10个测试组的10个混淆矩阵的总和。

References

[1] Hastie,T.,R。Tibshirani和J. Friedman。统计学习的要素。New York: Springer, 2001.

也可以看看

在R2008A中引入

Was this topic helpful?