主要内容

基于贝叶斯优化的自动分类器选择

此示例显示了如何使用fitcauto给定培训预测器和响应数据,自动尝试具有不同的超参数值的分类模型类型。该函数使用贝叶斯优化选择模型及其超级计数器值,并计算每个模型的交叉验证分类错误。优化完成后,fitcauto返回在整个数据集上培训的模型,预计将最佳分类新数据。检查测试数据的模型性能。

加载示例数据

此示例使用存储在1994年人口普查数据人口普查1994.Mat。数据集包括来自美国人口普查局的人口统计信息,可用于预测个人每年赚超过50,000美元。

加载样本数据人口普查1994.,其中包含训练数据AdultData.测试数据成年人。预览训练数据集的前几行。

加载人口普查1994.头(AdultData)
ans =.8×15表年龄workClass fnlwgt教育education_num婚姻状况职业关系种族性别capital_gain capital_loss hours_per_week NATIVE_COUNTRY工资___ ________________ __________ _________ _____________ _____________________ _________________ _____________ _____ ______ ____________ ____________ ______________ ______________ ______ 39国政务77516个学士13未婚ADM-文书不在位家庭白人男性2174 0 40美国-美国<= 50K 50自EMP-未INC 83311所大学13已婚-CIV-配偶Exec的-管理夫白人男性0 0 13美利坚-美国<= 50K 38私人2.1565e + 05 HS-9毕业离婚处理程序,清洁工不在位家庭白人男性0 0 40美国 - 美国<= 50K 53私人2.3472e + 05 11 7已婚,CIV配偶的处理程序,清洁工丈夫黑人男0 0 40美国 - 美国<= 50K 28个人3.3841e + 05大学13已婚-CIV-配偶教授的特种妻子黑色女性0 0 40古巴<= 50K 37私人2.8458e + 05大师14已婚-CIV-配偶Exec的-管理妻子白色女0 0 40统一泰德 - 美国<= 50K 49私人1.6019e + 05九五已婚配偶,无其他服务不在位家族黑女0 0 16牙买加<= 50K 52自EMP-不-INC 2.0964e + 05 HS-毕业9已婚,CIV配偶Exec的,管理丈夫的白人男性0 0 45美国 - 美国> 50K

每行包含一个成人的人口统计信息。最后一列薪水展示一个人是否有薪水小于或等于每年50,000美元或每年大于50,000美元。

使用自动选型

fitcauto自动找到适当的分类器以获取数据AdultData.。设置观察权重,并指定并行运行贝叶斯优化,这需要并行计算工具箱™。由于并行时序的不可递容性,并行贝叶斯优化不一定会产生可重复的结果。

由于优化的复杂性,这个过程可能需要一些时间,特别是对于较大的数据集。默认情况下,fitcauto提供优化和优化结果的迭代显示图。有关如何解释这些结果的更多信息,请参阅详细展示

选项= struct(“UseParallel”,真正的);[mdl,结果]= fitcauto (adultdata'薪水''重量''fnlwgt'...'hyperparameteroptimizationoptions',选项);
警告:建议您首先在优化Naive Bayes'宽度'参数时标准化所有数字预测器。如果您已完成此操作,请忽略此警告。
使用“本地”配置文件启动并行池(Parpool)连接到并行池(工人数:6)。将目标函数复制到工人......对工人进行复制目标职能。
学习者类型探索:合奏,NB,树总迭代(MaxobjectiveEvaluations):90总时间(MAXTIME):INF
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:值| | | | | |损失结果工人&验证(sec) | |验证损失确认的损失  | | | |===========================================================================================================================================| | 最好1 | 6 | | 0.16287 | 4.3468 | 0.16287 | 0.16287 | nb | DistributionNames:normal | | | | | | | | | |宽度:NaN |
|2 |5 |接受|0.14389 |6.1049 |0.14162 |0.14287 |树|minleafsize:21 | | 3 | 5 | Best | 0.14162 | 5.6195 | 0.14162 | 0.14287 | tree | MinLeafSize: 50 |
|4 |6 |接受|0.15626 |74.156 |0.14162 |0.14287 |合奏|方法:LogitBoost | | | | | | | | | | NumLearningCycles: 283 | | | | | | | | | | MinLeafSize: 7330 |
| 5 | 6 | Accept | 0.15603 | 77.293 | 0.14162 | 0.14287 | ensemble | Method: LogitBoost | | | | | | | | | | NumLearningCycles: 295 | | | | | | | | | | MinLeafSize: 3 |
|6 |6 |接受|0.16027 |5.6224 |0.14162 |0.14842 |树|minleafsize:5 |
|7 |6 |接受|0.17343 |8.6209 |0.14162 |0.15576 |树|MinLeafSize:2 |
|8 |6 |接受|0.15103 |4.8867 |0.14162 |0.15392 |树|MinLeafSize:8 |
|9 |6 |接受|0.17642 |1.1808 |0.14162 |0.15449 |树|MINLEAFSIZE:1663 |
|10 |6 |接受|0.15927 |5.0734 |0.14162 |0.15343 |树|MINLEAFSIZE:6 |
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 11 | 6 | Accept | 0.17009 | 1.6504 | 0.14162 | 0.15533 | tree | MinLeafSize: 1272 |
|12 |6 |接受|0.17869 |1.0308 |0.14162 |0.154 |树|minleafsize:2744 |
|13 |6 |接受|0.17961 |116.64 |0.14162 |0.154 |NB |DistributionNames:内核| | | | | | | | | | Width: 274.23 |
| | 5 | 14日接受| 0.15128 | 118.36 | 0.14162 | 0.15383 |合奏|方法:袋  | | | | | | | | | | NumLearningCycles: 241  | | | | | | | | | | MinLeafSize: 23 | | 15 | 5 |接受| 0.15177 | 115.42 | 0.14162 | 0.15383 |合奏|方法:袋  | | | | | | | | | | NumLearningCycles: 235  | | | | | | | | | | MinLeafSize: 40 |
|16 |5 |接受|0.15116 |115.49 |0.14162 |0.15326 |合奏|方法:袋| | | | | | | | | | NumLearningCycles: 235 | | | | | | | | | | MinLeafSize: 40 |
|17 |6 |接受|0.14887 |63.412 |0.14162 |0.15326 |NB |DistributionNames:内核| | | | | | | | | | Width: 0.56014 |
|18 |6 |接受|0.17869 |0.89318 |0.14162 |0.15219 |树|minleafsize:2712 |
|19 |6 |接受|0.17676 |59.781 |0.14162 |0.15219 |合奏|方法:袋| | | | | | | | | | NumLearningCycles: 208 | | | | | | | | | | MinLeafSize: 4208 |
|20 |6 |接受|0.15086 |81.42 |0.14162 |0.15219 |NB |DistributionNames:内核| | | | | | | | | | Width: 2.4778 |
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:值| | | | | |损失结果工人&验证(sec) | |验证损失确认的损失  | | | |===========================================================================================================================================| | 21 | 6 |接受| 0.16287 | 0.64656 | 0.14162 | 0.15219 | nb | DistributionNames:normal | | | | | | | | | |宽度:NaN |
|22 |6 |接受|0.14943 |75.578 |0.14162 |0.15219 |NB |DistributionNames:内核| | | | | | | | | | Width: 1.6195 |
|23 |6 |接受|0.16287 |0.49489 |0.14162 |0.15219 |NB |DistributionNames:正常| | | | | | | | | | Width: NaN |
|24 |6 |接受|0.14926 |68.642 |0.14162 |0.15219 |NB |DistributionNames:内核| | | | | | | | | | Width: 1.2371 |
|25 |6 |接受|0.16287 |0.5124 |0.14162 |0.15219 |NB |DistributionNames:正常| | | | | | | | | | Width: NaN |
|26 |6 |接受|0.15609 |58.267 |0.14162 |0.15219 |合奏|方法:LogitBoost | | | | | | | | | | NumLearningCycles: 247 | | | | | | | | | | MinLeafSize: 1 |
|27 |6 |接受|0.16287 |0.93385 |0.14162 |0.15219 |NB |DistributionNames:正常| | | | | | | | | | Width: NaN |
|28 |6 |接受|0.15554 |4.3668 |0.14162 |0.15067 |树|minleafsize:7 |
|29 |6 |接受|0.15087 |127.01 |0.14162 |0.15067 |合奏|方法:袋| | | | | | | | | | NumLearningCycles: 289 | | | | | | | | | | MinLeafSize: 9 |
|30 |6 |接受|0.15142 |127.39 |0.14162 |0.15067 |合奏|方法:袋| | | | | | | | | | NumLearningCycles: 289 | | | | | | | | | | MinLeafSize: 9 |
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 31 | 6 | Accept | 0.14177 | 2.6306 | 0.14162 | 0.14707 | tree | MinLeafSize: 116 |
| 32 | 6 |接受| 0.16287 | 1.1225 | 0.14162 | 0.14707 | nb | DistributionNames: normal | | | | | | | | | | Width: NaN |
|33 |6 |接受|0.15737 |56.258 |0.14162 |0.14707 |合奏|方法:LogitBoost | | | | | | | | | | NumLearningCycles: 233 | | | | | | | | | | MinLeafSize: 5308 |
|34 |6 |接受|0.15158 |97.559 |0.14162 |0.14707 |合奏|方法:袋| | | | | | | | | | NumLearningCycles: 214 | | | | | | | | | | MinLeafSize: 133 |
|35 |6 |接受|0.1719 |96.392 |0.14162 |0.14707 |合奏|方法:袋| | | | | | | | | | NumLearningCycles: 223 | | | | | | | | | | MinLeafSize: 1526 |
|36 |6 |接受|0.16287 |0.42054 |0.14162 |0.14707 |NB |DistributionNames:正常| | | | | | | | | | Width: NaN |
|37 |6 |接受|0.14441 |3.5932 |0.14162 |0.14598 |树|MINLEAFSIZE:18 |
|38 |6 |接受|0.16287 |0.34693 |0.14162 |0.14598 |NB |DistributionNames:正常| | | | | | | | | | Width: NaN |
| 39 | 6 | Accept | 0.14432 | 3.4661 | 0.14162 | 0.145 | tree | MinLeafSize: 19 |
|40 |6 |接受|0.14291 |2.3121 |0.14162 |0.14321 |树|minleafsize:231 |
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 41 | 6 | Accept | 0.15278 | 96.086 | 0.14162 | 0.14321 | nb | DistributionNames: kernel | | | | | | | | | | Width: 3.5668 |
|42 |6 |接受|0.15068 |1.9847 |0.14162 |0.14348 |树|minleafsize:412 |
|43 |6 |接受|0.14705 |2.1122 |0.14162 |0.14343 |树|MINLEAFSIZE:305 |
| 44 | 6 | Accept | 0.14186 | 2.3835 | 0.14162 | 0.14309 | tree | MinLeafSize: 168 |
|45 |6 |接受|0.16209 |1.9821 |0.14162 |0.14302 |树|MINLEAFSIZE:573 |
46 | | 5 |接受| 0.15783 | 53.627 | 0.14135 | 0.14271 |合奏|方法:LogitBoost  | | | | | | | | | | NumLearningCycles: 211  | | | | | | | | | | MinLeafSize: 125 | | 47最好| 5 | | 0.14135 | 3.1329 | 0.14135 | 0.14271 | |树MinLeafSize: 63 |
|48 |4 |接受|0.15637 |63.578 |0.14135 |0.14236 |合奏|方法:LogitBoost | | | | | | | | | | NumLearningCycles: 252 | | | | | | | | | | MinLeafSize: 485 | | 49 | 4 | Accept | 0.1448 | 2.1012 | 0.14135 | 0.14236 | tree | MinLeafSize: 263 |
|50 |3 |接受|0.1513 |114.35 |0.14135 |0.14224 |合奏|方法:袋| | | | | | | | | | NumLearningCycles: 253 | | | | | | | | | | MinLeafSize: 13 | |===========================================================================================================================================| | Iter | Active | Eval | Validation | Time for training | Observed min | Estimated min | Learner | Hyperparameter: Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 51 | 3 | Accept | 0.14271 | 2.2737 | 0.14135 | 0.14224 | tree | MinLeafSize: 133 |
|52 |6 |接受|0.14349 |1.9707 |0.14135 |0.14224 |树|minleafsize:199 |
|53 |3 |接受|0.15337 |1.6887 |0.14135 |0.14235 |树|minleafsize:441 | | 54 | 3 | Accept | 0.17869 | 1.049 | 0.14135 | 0.14235 | tree | MinLeafSize: 1821 | | 55 | 3 | Accept | 0.1785 | 0.9639 | 0.14135 | 0.14235 | tree | MinLeafSize: 3523 | | 56 | 3 | Accept | 0.18062 | 0.63917 | 0.14135 | 0.14235 | tree | MinLeafSize: 4359 |
|57 |6 |接受|0.14673 |3.2067 |0.14135 |0.14207 |树|minleafsize:12 |
|58 |6 |接受|0.14238 |2.3081 |0.14135 |0.14215 |树|minleafsize:177 |
|59 |5 |接受|0.16352 |125.94 |0.14135 |0.1419 |合奏|方法:袋| | | | | | | | | | NumLearningCycles: 297 | | | | | | | | | | MinLeafSize: 823 | | 60 | 5 | Accept | 0.14162 | 2.849 | 0.14135 | 0.1419 | tree | MinLeafSize: 50 |
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 61 | 5 | Best | 0.14113 | 2.6499 | 0.14113 | 0.14173 | tree | MinLeafSize: 83 |
| 62 | 5 | Accept | 0.14178 | 2.9853 | 0.14113 | 0.14153 | tree | MinLeafSize: 40 |
| 63 | 5 | Accept | 0.14157 | 2.8701 | 0.14113 | 0.14153 | tree | MinLeafSize: 42 |
| 64 | 5 | Accept | 0.15886 | 1.7188 | 0.14113 | 0.14161 | tree | MinLeafSize: 532 |
|65 |5 |接受|0.14529 |3.6593 |0.14113 |0.14151 |树|minleafsize:14 |
|66 |4 |接受|0.23856 |41.472 |0.14113 |0.14151 |合奏|方法:袋| | | | | | | | | | NumLearningCycles: 209 | | | | | | | | | | MinLeafSize: 8676 | | 67 | 4 | Accept | 0.14702 | 4.0559 | 0.14113 | 0.14151 | tree | MinLeafSize: 10 |
|68 |4 |最好的0.14058 |2.8472 |0.14058 |0.14148 |树|MinLeafSize:30 |
| 69 | 4 | Accept | 0.14168 | 2.1868 | 0.14058 | 0.14143 | tree | MinLeafSize: 112 |
|70 |4 |接受|0.14072 |2.9698 |0.14058 |0.14144 |树|minleafsize:28 |
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:值| | | | | |损失结果工人&验证(sec) | |验证损失确认的损失  | | | |===========================================================================================================================================| | 71 | 4 |接受| 0.14117 | 2.8824 | 0.14058 | 0.14114 | |树MinLeafSize: 29 |
|72 |4 |最好的0.14046 |2.8853 |0.14046 |0.14112 |树|MinLeafSize:25 |
| 73 | 4 | Accept | 0.14184 | 2.8532 | 0.14046 | 0.14103 | tree | MinLeafSize: 24 |
|74 |4 |接受|0.14112 |2.7998 |0.14046 |0.14102 |树|MinLeafSize:33 |
|75 |4 |接受|0.14331 |3.0835 |0.14046 |0.141 |树|MinLeafSize:23 |
|76 |4 |接受|0.14089 |2.9637 |0.14046 |0.14086 |树|minleafsize:31 |
|77 |4 |接受|0.14046 |3.0017 |0.14046 |0.14083 |树|MinLeafSize:25 |
|78 |3 |接受|0.15093 |91.952 |0.14046 |0.14085 |合奏|方法:袋| | | | | | | | | | NumLearningCycles: 222 | | | | | | | | | | MinLeafSize: 27 | | 79 | 3 | Accept | 0.14046 | 2.9993 | 0.14046 | 0.14085 | tree | MinLeafSize: 25 |
|80 |6 |接受|0.14046 |2.7739 |0.14046 |0.14073 |树|MinLeafSize:25 |
|===========================================================================================================================================| | Iter | |活跃Eval培训| | |验证时间观察敏|估计分钟|学生| Hyperparameter:Value | | | workers | result | loss | & validation (sec)| validation loss | validation loss | | | |===========================================================================================================================================| | 81 | 2 | Accept | 0.18178 | 101.13 | 0.14046 | 0.14068 | nb | DistributionNames: kernel | | | | | | | | | | Width: 868.86 | | 82 | 2 | Accept | 0.14184 | 3.2218 | 0.14046 | 0.14068 | tree | MinLeafSize: 24 | | 83 | 2 | Accept | 0.17807 | 0.82685 | 0.14046 | 0.14068 | tree | MinLeafSize: 3874 | | 84 | 2 | Accept | 0.15989 | 1.8729 | 0.14046 | 0.14068 | tree | MinLeafSize: 540 | | 85 | 2 | Accept | 0.15103 | 3.8835 | 0.14046 | 0.14068 | tree | MinLeafSize: 8 |
|86 |6 |接受|0.14046 |2.5909 |0.14046 |0.14067 |树|MinLeafSize:25 |
|87 |6 |接受|0.14331 |3.5433 |0.14046 |0.14067 |树|MinLeafSize:23 |
|88 |6 |接受|0.23856 |47.904 |0.14046 |0.14067 |合奏|方法:袋| | | | | | | | | | NumLearningCycles: 258 | | | | | | | | | | MinLeafSize: 12543 |
| 89 | 6 | Accept | 0.14914 | 59.665 | 0.14046 | 0.14067 | nb | DistributionNames: kernel | | | | | | | | | | Width: 0.37688 |
|90 |6 |接受|0.15604 |68.731 |0.14046 |0.14067 |合奏|方法:LogitBoost | | | | | | | | | | NumLearningCycles: 262 | | | | | | | | | | MinLeafSize: 2 |

__________________________________________________________优化完成。总迭代:90次经过时间:577.1419秒训练和验证总时间:2558.1542秒最佳观察学习者是一款树模型:Minleafsize:25观察到的验证损失:0.14046培训和验证时间:2.8853秒最佳估计学习者(返回模型)是一个树模型,minleafsize:25估计验证损失:0.14067训练和验证的预计时间:FitCauto显示器的文件记录为2.8824秒

最终模型通过返回fitcauto对应于最佳估计的学习者。在返回模型之前,该功能使用整个培训数据重新检索它(AdultData.),列出的学习者(或模型)类型,以及显示的超参数值。

评估测试设置绩效

评估返回模型的性能MDL.在测试集上成年人通过使用混淆矩阵和接收器操作特征(ROC)曲线。

查找预测标签和测试集的分数值。

[标签,分数] =预测(MDL,Instanttest);

创建从测试组结果混淆矩阵。对角线元素表示给定类的正确分类实例的数量。非对角元素是错误分类观测的实例。

ConfusionChart(AdutherTest.Salary,标签)

计算测试集的分类精度。准确性为正确分类的测试集观察值的百分比。

精度=(1损失(MDL,Adutertest,'薪水'))* 100
精度= 85.1513.

要绘制ROC曲线对应于标签的分数值'<= 50K',发现列得分这对应于该标签。列序得分匹配培训的模型中类的顺序。

mdl.classnames.
ans =.2×1分类<= 50k> 50k

因为'<= 50K'首先列出,第一列得分对应于该标签。

绘制ROC曲线,并计算曲线(AUC)下的区域。ROC曲线显示了对分类器输出的不同阈值的假阳性率的真正阳性率。对于一个完美的分类器,其真正的阳性率始终为1,无论阈值,AUC = 1.对于随机分配对类的观察,AUC = 0.5的二进制分类。大AUC值(接近1)表示良好的分类器性能。

[x,y,〜,auc] = perfcurve(成年人,分数(:,1),'<= 50K');plot(x,y)标题('roc曲线')xlabel('虚假阳性率')ylabel('真正的阳性率'

AUC
AUC = 0.8947

基于准确率和AUC值,分类器在测试数据上表现良好。

也可以看看

|||

相关话题