主要内容

优化一个增强的回归集合

这个例子展示了如何优化增强回归集合的超参数。该优化最小化了模型的交叉验证损失。

问题是根据汽车的加速度、发动机排量、马力和重量,以每加仑汽油行驶英里数为单位来模拟汽车的效率。加载carsmall数据,其中包含这些和其他预测因子。

负载carsmallX =[加速度位移马力重量];Y = mpg;

方法将回归集成适合于数据LSBoost算法,并使用代理分割。通过改变学习周期的数量、代理分割的最大数量和学习率来优化得到的模型。此外,允许优化在每个迭代之间重新划分交叉验证。

为了重现性,设置随机种子并使用“expected-improvement-plus”采集功能。

rng (“默认”) Mdl = fitrensemble(X,Y,...“方法”“LSBoost”...“学习者”templateTree (“代孕”“上”),...“OptimizeHyperparameters”, {“NumLearningCycles”“MaxNumSplits”“LearnRate”},...“HyperparameterOptimizationOptions”结构(“再分配”,真的,...“AcquisitionFunctionName”“expected-improvement-plus”))
|====================================================================================================================| | Iter | Eval |目的:| |目的BestSoFar | BestSoFar | NumLearningC - | LearnRate | MaxNumSplits | | | |结果日志(1 +损失)运行时| |(观察)| (estim) |永昌龙  | | | |====================================================================================================================| | 最好1 | | 3.5219 | 20.079 | 3.5219 | 3.5219 | 383 | 0.51519 | 4 | | 2 |最好| 3.4752 | 0.83429 | 3.4752 | 3.4777 | 16 | 0.66503 | 7 | | 3 |的| 3.1575 | 1.5792 | 3.1575 | 3.1575 | 33 | 0.2556 | 92 | | 4 | | 6.3076接受13 | | 0.87828 | 3.1575 | 3.1579 | 0.0053227 | 5 | | 5 |接受| 3.4449 | 11.803 | 3.1575 | 3.1579 | 277 | 0.45891 | 99 | | 6 |接受| 3.9806 | 0.62494 | 3.1575 | 3.1584 | 33 10 | 0.13017 | | | 7最好| | 3.059 | 0.45984 | 3.059 | 3.06 | 10 | 0.30126 | 3 | | |接受8 | 3.1707 | 0.98218 | 3.059 | 3.1144 | 10 | 0.28991 | 15 | | | 9日接受| 3.0937 | 0.78937 | 3.059 | 3.1046 | 10 | 0.31488 | 13 | | |接受10 | 3.196 | 0.56298 | 3.059 | 3.1233 | 10 | 0.32005 | 11 | | 11 | | 3.0495 |最好0.48452 | 3.0495 | 3.1083 | 0.27882 | | 85 | | 12最好| | 2.946 | 1.2095 | 2.946 | 3.0774 | 7 10 | 0.27157 | | | | 13日接受| 3.2026 | 0.54187 | 2.946 | 3.0995 | 10 | 0.25734 | 20 | | | 14日接受| 5.7151 | 14.348 | 2.946 | 3.0996 | 376 | 0.001001 | 43 | | | 15日接受| 3.207 | 19.037 | 2.946 | 3.0937 | 499 | 0.027394 | 18 | | | 16日接受| 3.8606 | 1.9099 | 2.946 | 3.0937 | 36 | 0.041427 | 12 | | | 17日接受| 3.2026 | 18.422 | 2.946 | 3.095 | 443 | 76 | 0.019836 | | | 18日接受| 3.4832 |7.4091 | 2.946 | 3.0956 | 205 | 0.99989 | 8 | | 19 | Accept | 5.6285 | 9.0913 | 2.946 | 3.0942 | 192 | 0.0022197 | 2 | | 20 | Accept | 3.0896 | 8.1 | 2.946 | 3.0938 | 188 | 0.023227 | 93 | |====================================================================================================================| | Iter | Eval | Objective: | Objective | BestSoFar | BestSoFar | NumLearningC-| LearnRate | MaxNumSplits | | | result | log(1+loss) | runtime | (observed) | (estim.) | ycles | | | |====================================================================================================================| | 21 | Accept | 3.1408 | 6.89 | 2.946 | 3.0935 | 156 | 0.02324 | 5 | | 22 | Accept | 4.691 | 0.63904 | 2.946 | 3.0941 | 12 | 0.076435 | 2 | | 23 | Accept | 5.4686 | 2.0784 | 2.946 | 3.0935 | 50 | 0.0101 | 58 | | 24 | Accept | 6.3759 | 1.0794 | 2.946 | 3.0893 | 23 | 0.0014716 | 22 | | 25 | Accept | 6.1278 | 1.9941 | 2.946 | 3.094 | 47 | 0.0034406 | 2 | | 26 | Accept | 5.9134 | 0.61233 | 2.946 | 3.0969 | 11 | 0.024712 | 12 | | 27 | Accept | 3.401 | 5.7575 | 2.946 | 3.0995 | 151 | 0.067779 | 7 | | 28 | Accept | 3.2757 | 8.5287 | 2.946 | 3.1009 | 198 | 0.032311 | 8 | | 29 | Accept | 3.2296 | 0.88442 | 2.946 | 3.1023 | 17 | 0.30283 | 19 | | 30 | Accept | 3.2385 | 3.1546 | 2.946 | 3.1027 | 83 | 0.21601 | 76 |

图中包含一个轴对象。标题为Min objective vs. Number of function的axis对象包含2个类型为line的对象。这些对象代表最小观测目标,估计最小目标。

__________________________________________________________ 优化完成。最大目标达到30个。总函数评估:30总运行时间:181.9637秒总目标函数评估时间:150.7641最佳观测可行点:NumLearningCycles LearnRate MaxNumSplits _________________ _________ ____________ 10 0.27157 7观测目标函数值= 2.946估计目标函数值= 3.1219函数评估时间= 1.2095最佳估计可行点(根据模型):NumLearningCycles LearnRate MaxNumSplits _________________ _________ ____________ 10 0.30126 3估计目标函数值= 3.1027估计函数评估时间= 0.65461
Mdl = RegressionEnsemble ResponseName: 'Y' CategoricalPredictors: [] ResponseTransform: 'none' NumObservations: 94 HyperparameterOptimizationResults: [1x1 BayesianOptimization] NumTrained: 10 Method: 'LSBoost' LearnerNames: {'Tree'} reasonforterminate: '完成所要求的训练周期数后正常终止。'FitInfo: [10x1 double] FitInfoDescription: {2x1 cell}正则化:[]属性,方法

将损失与增强的、未优化的模型的损失和默认集成的损失进行比较。

损失= kfoldLoss(交叉val(Mdl,“kfold”10))
损失= 19.2667
Mdl2 = fitrensemble(X,Y,...“方法”“LSBoost”...“学习者”templateTree (“代孕”“上”));loss2 = kfoldLoss(交叉val(Mdl2,“kfold”10))
Loss2 = 30.4083
Mdl3 = fitrensemble(X,Y);loss3 = kfoldLoss(交叉val(Mdl3,“kfold”10))
Loss3 = 29.0495

有关优化此集成的另一种方法,请参见使用交叉验证优化回归集合