杂交
使用交叉验证的损失估算
句法
vals = crossval(fun,X)
vals = crossval(fun,X,Y,...)
MSE = CrossVal('Mse',X,Y,'Predfun',Predfun)
mcr = crossval('mcr',X,y,'Predfun',predfun)
val = crossval(标准
,,,,X1,,,,X2,...,y,'Predfun',predfun)
vals = crossval(...,'姓名
',,value
)
Description
vals = crossval(fun,X)
对该功能执行10倍的交叉验证fun
,应用于数据X
。
fun
is a function handle to a function with two inputs, the training subset ofX
,,,,Xtrain
,以及测试子集X
,,,,XTEST
,,,,as follows:
测试值= fun(XTRAIN,XTEST)
Each time it is called,fun
should useXtrain
要适合模型,然后返回一些标准测试值
Computed onXTEST
使用该拟合模型。
X
可以是列矢量或矩阵。行X
Correspond to observations; columns correspond to variables or features. Each row ofvals
Contains the result of applyingfun
to one test set. If测试值
is a non-scalar value,杂交
使用线性索引将其转换为行矢量,并存储在一排vals
。
vals = crossval(fun,X,Y,...)
is used when data are stored in separate variablesX
,,,,y
,...。所有变量(列向量,矩阵或数组)必须具有相同数量的行。fun
被称为训练子集X
,,,,y
,...,然后是测试子集X
,,,,y
,,,,。。。,,,,as follows:
testvals = fun(xtrain,ytrain,...,xtest,ytest,...)
MSE = CrossVal('Mse',X,Y,'Predfun',Predfun)
返回mse
,,,,a scalar containing a 10-fold cross validation estimate of mean-squared error for the functionpreatfun
。X
可以是列矢量,矩阵或预测变量。y
是响应值的列向量。X
andy
必须具有相同数量的行。
preatfun
是一个函数句柄,并带有训练子集X
,训练子集y
,以及测试子集X
as follows:
是的= predfun(XTRAIN,ytrain,XTEST)
Each time it is called,preatfun
should useXtrain
andytrain
为了拟合回归模型,然后返回列矢量中的拟合值是的
。Each row of是的
Contains the predicted values for the corresponding row ofXTEST
。杂交
计算平方错误是的
and the corresponding response test set, and returns the overall mean across all test sets.
mcr = crossval('mcr',X,y,'Predfun',predfun)
返回mcr
,,,,a scalar containing a 10-fold cross validation estimate of misclassification rate (the proportion of misclassified samples) for the functionpreatfun
。The matrixX
Contains predictor values and the vectory
Contains class labels.preatfun
should useXtrain
andyTRAIN
适合分类模型并返回是的
作为预测的班级标签XTEST
。杂交
Computes the number of misclassifications between是的
以及相应的响应测试集,并返回所有测试集的总体错误分类率。
val = crossval(
,,,,where标准
,,,,X1,,,,X2,...,y,'Predfun',predfun)标准
is'mse'
or'MCR'
,返回均方误差(用于回归模型)或错误分类率(对于分类模型)的交叉验证估计值,该估计值具有预测值x1
,,,,x2
,...和分别是响应值或类标签y
。x1
,,,,x2
,,,,。。。andy
必须具有相同数量的行。preatfun
is a function handle called with the training subsets ofx1
,,,,x2
,...,训练子集y
,,,,and the test subsets ofx1
,,,,x2
,,,,。。。,,,,as follows:
yfit = preatfun(x1train,x2train,...,ytrain,x1test,x2test,...)
是的
should be a column vector containing the fitted values.
vals = crossval(...,'
从下表指定一个或多个可选的参数名称/值对。指定姓名
',,value
)姓名
内部引号。
姓名 | Value |
---|---|
holdout |
A scalar specifying the ratio or the number of observations |
kfold |
一个大于1指定折叠数的正整数 |
leaveout |
Specifies leave-one-out cross validation. The value must be |
mcreps |
一个积极的整数,指定蒙特卡洛重复的数量进行验证。如果第一个输入 |
分割 |
An object |
分层 |
A column vector |
options |
A structure that specifies whether to run in parallel, and specifies the random stream or streams. Create the
|
只有一个kfold
,,,,holdout
,,,,leaveout
, 或者分割
Can be specified, and分割
Cannot be specified with分层
。如果两者分割
andmcreps
are specified, the first Monte-Carlo repetition uses the partition information in theCVPARTITION
对象和repartition
调用方法来生成其余重复的新分区。如果未指定交叉验证类型,则默认值为10倍交叉验证。
笔记:什么时候using cross validation with classification algorithms, stratification is preferred. Otherwise, some test sets may not include observations from all classes. |
Examples
Example 1
计算使用10倍交叉验证回归的于点误差:
负载('Fisheriris');y = meas(:,1);x = [hons(size(y,1),1),mes(:,2:4)];regf =@(xtrain,ytrain,xtest)(xtest*recress(ytrain,xtrain));CVMSE = CrossVal('MSE',X,Y,'Predfun',regf)CVMSE = 0.1015
Example 2
使用分层的10倍交叉验证计算错误分类率:
负载('Fisheriris');y =物种;x = meas;cp = cvPartition(y,'k',10);%分层的交叉验证classf = @(xtrain,ytrain,xtest)(分类(Xtest,Xtrain,... ytrain));cvmcr = crossVal('mcr',x,y,'prepfun',classf,'partition',cp)cvmcr = 0.0200
Example 3
Compute the confusion matrix using stratified 10-fold cross validation:
负载('Fisheriris');y =物种;x = meas;order = unique(y); % Order of the group labels cp = cvpartition(y,'k',10); % Stratified cross-validation f = @(xtr,ytr,xte,yte)confusionmat(yte,... classify(xte,xtr,ytr),'order',order); cfMat = crossval(f,X,y,'partition',cp); cfMat = reshape(sum(cfMat),3,3) cfMat = 50 0 0 0 48 2 0 1 49
CFMAT
是10个测试组的10个混淆矩阵的总和。
References
[1] Hastie,T.,R。Tibshirani和J. Friedman。统计学习的要素。New York: Springer, 2001.