treebaggerclass

袋决策树

Description

treebagger袋子为分类或回归的决策树的集合。装袋代表引导聚合。合奏中的每棵树都在独立绘制的输入数据的自主绘制复制品上生长。此副本中未包含的观察是“从包中的”这棵树。

treebaggerrelies on theClassificationTreeRegressionTree种植个体树木的功能。特别是,ClassificationTreeRegressionTree接受随机选择的功能数量,每个决定拆分为可选的输入参数。那是,treebagger一世mplements the random forest algorithm[1]

For regression problems,treebaggersupports mean and quantile regression (that is, quantile regression forest[2])。

  • To predict mean responses or estimate the mean-squared error given data, pass atreebagger模型和数据到预测要么error, 分别。执行类似的操作以进行袋袋观察,使用oobPredict要么ooberror.

  • 估计响应分布的定量或给定数据的定量误差,通过atreebagger模型和数据到standilepredict.要么quantileError, 分别。执行类似的操作以进行袋袋观察,使用OOBQUANTILEPREDICT要么ooberror.

Construction

treebagger 创造决策树

方法s

附加 添加新树合奏
compact 决策树的紧凑型集合
error 错误(错误分类概率或MSE)
fillprox Proximity matrix for training data
Growtees. 培训额外的树木并添加到合奏
margin 分类保证金
mdsprox. 多维标度的距离矩阵
meanMargin Mean classification margin
ooberror. Out-of-bag error
oobMargin Out-of-bag margins
Oobmeanmargin. 袋子的平均边距
oobPredict Ensemble predictions for out-of-bag observations
oobQuantileError 袋子袋数丢失的袋子
OOBQUANTILEPREDICT 从袋子袋袋中观测到回归树的分量预测
预测 使用袋装决策树的集合来预测响应
quantileError Quantile loss using bag of regression trees
standilepredict. 使用袋子回归树预测响应量子

Properties

Classnames.

包含响应变量的类名的单元格数组y。This property is empty for regression trees.

computeobprediction.

应计算指定是否应该计算用于训练观察的袋子预测的逻辑标志。默认为

If this flag is真正,这following properties are available:

  • Oobindices.

  • oobinstance uight.

If this flag is真正,这following methods can be called:

  • ooberror.

  • oobMargin

  • Oobmeanmargin.

computeoobpredictorimportance.

应计算指定是否应计算可变重要性外包估计的逻辑标志。默认为。If this flag is真正, 然后computeobprediction.也是如此。

If this flag is真正,这following properties are available:

  • OOBPermutedPredictorDeltaError

  • OOBPermutedPredictorDeltaMeanMargin

  • OOBPermutedPredictorCountRaiseMargin

Cost

Square matrix, where成本(i,j)是T.he cost of classifying a point into classj如果一世T.s true class is一世(即,行对应于真实类,列对应于预测类)。行和列的顺序Cost对应于类的顺序Classnames.。该number of rows and columns inCost是T.he number of unique classes in the response.

这个属性是:

  • read-only

  • 空([])对于回归树的合奏

DefaultYfit

Default value returned by预测oobPredict。该DefaultYfitproperty controls what predicted value is returned when no prediction is possible. For example, whenoobPredictneeds to predict for an observation that is in-bag for all trees in the ensemble.

  • For classification, you can set this property to either''要么'MostPopular'。如果你选择'MostPopular'(默认值),属性值成为培训数据中最可能类的名称。如果你选择'',袋内观察被排除在禁止外误差和边距的计算之外。

  • 回归,you can set this property to any numeric scalar. The default value is the mean of the response for the training data. If you set this property to,袋内观察被排除在禁止外误差和边距的计算之外。

deltacriteriondecisionplit.

一个数字1-by-的数字数组Nvars分裂标准的变化总结了每个变量的分裂,平均整个成长树的整体。

InBagFraction

随机选择的观察分数,用于替换每个引导副本。每个副本的大小是谈判×InBagFraction那where谈判是培训集中的观测数量。默认值为1。

mergeleaves.

一个逻辑标志,指定与同一父级的决策树是否留下的拆分是不降低总风险的拆分。默认值是

方法

树木使用的方法。可能的值是'classification'对于分类集合,和'regression'对于回归合奏。

minleafsize.

每棵树叶的最小观察数。默认,minleafsize.是1 for classification and 5 for regression. For decision tree training, the明蛋白value is set equal to2 * minleafsize.

NumTrees

标量值等于合奏中的决策树数。

numpredictorsplit

一个数字1-by-的数字数组Nvars那where every element gives a number of splits on this predictor summed over all trees.

numpredictorstosample.

Number of predictor or feature variables to select at random for each decision split. By default,numpredictorstosample.等于分类的分类总数的平方根,以及回归的总变量总数的三分之一。

Oobindices.

逻辑阵列大小谈判-by-NumTrees那where谈判是培训数据和培训数据的观察数NumTrees是集合中的树木数量。一种真正value for the (一世j)元素表示观察一世是树的袋子j。In other words, observation一世was not selected for the training data used to grow treej

oobinstance uight.

大小的数字数组谈判-1包含用于计算每次观察的禁止袋响应的树木数量。谈判是T.he number of observations in the training data used to create the ensemble.

OOBPermutedPredictorCountRaiseMargin

一个数字1-by-的数字数组Nvarscontaining a measure of variable importance for each predictor variable (feature). For any variable, the measure is the difference between the number of raised margins and the number of lowered margins if the values of that variable are permuted across the out-of-bag observations. This measure is computed for every tree, then averaged over the entire ensemble and divided by the standard deviation over the entire ensemble. This property is empty for regression trees.

OOBPermutedPredictorDeltaError

一个数字1-by-的数字数组Nvars包含每个预测变量(特征)的重要性衡量标准。对于任何变量,如果在袋袋外观察结果允许该变量的值,则测量值是预测误差的增加。对于每棵树计算此措施,然后在整个集合上平均并除以整个集合的标准偏差。

OOBPermutedPredictorDeltaMeanMargin

一个数字1-by-的数字数组Nvars包含每个预测变量(特征)的重要性衡量标准。For any variable, the measure is the decrease in the classification margin if the values of that variable are permuted across the out-of-bag observations. This measure is computed for every tree, then averaged over the entire ensemble and divided by the standard deviation over the entire ensemble. This property is empty for regression trees.

OutlierMeasure

一个数字大小数组谈判-by-1, where谈判是T.he number of observations in the training data, containing outlier measures for each observation.

Prior

每个班级的先前概率的数字矢量。元素的顺序Prior对应于类的顺序Classnames.

这个属性是:

  • read-only

  • 空([])对于回归树的合奏

Proximity

一个数字矩阵的大小谈判-by-谈判那where谈判是培训数据中的观察数,含有观察之间的邻近度的措施。对于任何两个观察,它们的接近程度被定义为这些观察结果在同一叶上造成的树木的一部分。这是一个对照矩阵,对角线和非对角线元件上的1S,范围为0到1。

Prune

Pruneproperty is true if decision trees are pruned and false if they are not. Pruning decision trees is not recommended for ensembles. The default value is false.

SampleWithReplacement.

一种logical flag specifying if data are sampled for each decision tree with replacement. This property is真正如果treebagger使用替换和数据示例数据除此以外。默认值为真正

TreeArguments

小区的参数阵列fitctree要么fitrtree。这些参数被使用treebagger在为合奏种植新树时。

树木

细胞阵列的大小NumTrees- 1含有集合中的树木。

Trustogateassociation.

大小的矩阵Nvars-by-Nvars具有可变关联的预测措施,平均整个成长树的整体。如果你长大了集合设置'代理'T.o'上'那T.his matrix for each tree is filled with predictive measures of association averaged over the surrogate splits. If you grew the ensemble setting'代理'T.o'off'(default),Trustogateassociation.是对角线。

PredictorNames

包含预测器变量的名称(特征)的单元格数组。treebaggerT.akes these names from the optional'名字'参数。默认名称是'x1''x2'

W.

Numeric vector of weights of length谈判那where谈判是T.he number of observations (rows) in the training data.treebagger使用这些权重来在集合中生长每个决策树。默认值W.那些(nobs,1)

X

一种T.able or numeric matrix of size谈判-by-Nvars那where谈判是观察数(行)和Nvars是培训数据中的变量(列)的数量。如果您使用预测值的表培训集合,那么X是一张桌子。如果您使用预测值值矩阵训练集合,那么X是一个矩阵。此属性包含预测器(或功能)值。

y

尺寸谈判array of response data. Elements ofy对应于行X。For classification,y是T.he set of true class labels. Labels can be anygrouping variable,即,数字或逻辑向量,字符矩阵,字符串阵列,字符向量或分类向量的小区阵列。treebagger将标签转换为特征向量的单元格数组以进行分类。回归,y是a numeric vector.

Examples

全部收缩

Load Fisher's iris data set.

加载渔民

Train an ensemble of bagged classification trees using the entire data set. Specify50weak learners. Store which observations are out of bag for each tree.

RNG(1);% For reproducibilitymdl = treebagger(50,meas,speies,'OOBPrediction''上'。。。'Method''classification'
MDL = TreeBagger合奏与50个袋装决策树:训练X:[150x4]训练Y:[150x1]方法:分类NumPredictors:4 NumPredictorstosample:2 minleafsize:1个土松效:1个样品释放:1 ComputeOobprediction:1 ComputeOobpredictorImportance:[]ClassNames:'Setosa''Versicolor''Virginica'属性,方法

MDL.是atreebaggerensemble.

MDL.。树木stores a 50-by-1 cell vector of the trained classification trees (CompactClassificationTree.模型对象)构成合奏。

Plot a graph of the first trained classification tree.

view(Mdl.Trees{1},'模式''graph'

默认,treebagger生长深沉的树木。

mdl.oobindices.stores the out-of-bag indices as a matrix of logical values.

Plot the out-of-bag error over the number of grown classification trees.

数字;Ooberrorbag​​gedensemble = OobError(MDL);绘图(Ooberrorbag​​gedensemble)Xlabel'成长树数';ylabel'Out-of-bag classification error';

该out-of-bag error decreases with the number of grown trees.

To label out-of-bag observations, passMDL.T.ooobPredict

Load theCarsmall.data set. Consider a model that predicts the fuel economy of a car given its engine displacement.

加载Carsmall.

Train an ensemble of bagged regression trees using the entire data set. Specify 100 weak learners.

RNG(1);% For reproducibilityMDL.= TreeBagger(100,Displacement,MPG,'Method''regression');

MDL.是atreebaggerensemble.

使用训练有素的回归树,您可以估计条件平均响应或执行量级回归以预测条件量数。

对于十个同等间隔的发动机位移在最小和最大的样本位移之间,预测有条件的平均响应和条件四分位数。

predX = linspace(min(Displacement),max(Displacement),10)'; mpgMean = predict(Mdl,predX); mpgQuartiles = quantilePredict(Mdl,predX,'standile',[0.25,0.5,0.75]);

Plot the observations, and estimated mean responses and quartiles in the same figure.

数字;情节(位移,MPG,'o');保持plot(predX,mpgMean); plot(predX,mpgQuartiles); ylabel('燃油经济');Xlabel('Engine displacement');legend('Data''Mean Response''第一个四分位数''中位''Third quartile');

Load theCarsmall.data set. Consider a model that predicts the mean fuel economy of a car given its acceleration, number of cylinders, engine displacement, horsepower, manufacturer, model year, and weight. ConsiderCylindersMFG.,和Model_Year作为分类变量。

加载Carsmall.汽缸=分类(圆柱);MFG =分类(CellStr(MFG));model_year =分类(model_year);X =表(加速,圆柱,位移,马力,MFG,。。。model_year,重量,mpg);RNG('default');% For reproducibility

显示分类变量中表示的类别数。

num cinders = numel(类别(圆柱体))
numCylinders = 3
nummfg = numel(类别(MFG))
nummfg = 28.
numModelYear = numel(categories(Model_Year))
nummodelyear = 3

B.ecause there are 3 categories only inCylindersModel_Year,这standard CART, predictor-splitting algorithm prefers splitting a continuous predictor over these two variables.

Train a random forest of 200 regression trees using the entire data set. To grow unbiased trees, specify usage of the curvature test for splitting predictors. Because there are missing values in the data, specify usage of surrogate splits. Store the out-of-bag information for predictor importance estimation.

MDL.= TreeBagger(200,X,'mpg''Method''regression''Surrogate''上'。。。'预测圈''曲率''OOBPredictorImportance''上');

treebagger在物业中存储预测原则重要性估计OOBPermutedPredictorDeltaError。使用条形图比较估计值。

一世mp = Mdl.OOBPermutedPredictorDeltaError; figure; bar(imp); title('Curvature Test');ylabel('Predictor importance estimates');Xlabel('预测器');H = GCA;h.xticklabel = mdl.predictornames;H.xticklabelrotation = 45;H.TicklabelInterpreter =.'没有';

In this case,Model_Year是T.he most important predictor, followed by重量

比较一世mpT.o predictor importance estimates computed from a random forest that grows trees using standard CART.

mdlcart = treebagger(200,x,'mpg''Method''regression''Surrogate''上'。。。'OOBPredictorImportance''上');一世mpCART = MdlCART.OOBPermutedPredictorDeltaError; figure; bar(impCART); title('标准购物车');ylabel('Predictor importance estimates');Xlabel('预测器');H = GCA;h.xticklabel = mdl.predictornames;H.xticklabelrotation = 45;H.TicklabelInterpreter =.'没有';

In this case,重量是一个连续的预测因子,是最重要的。接下来的两个最重要的预测因子是Model_Yearfollowed closely by马力那which is a continuous predictor.

Copy Semantics

值。要了解这会如​​何影响您对类的使用,请参阅Comparing Handle and Value Classes(Matlab)在Matlab中®面向对象的编程文档。

小费s

For atreebagger模型对象B.,这树木property stores a cell vector ofB.numtrees.CompactClassificationTree.要么Compactregressiontree.模型对象s. For a textual or graphical display of treeT.一世n the cell vector, enter

view(B.Trees{T.})

替代功能

统计和机器学习工具箱™为袋装和随机森林提供三个物体:

有关差异之间的详细信息treebagger和袋装合奏(ClassificationBaggedEnsemble回归释迦缩短)那seeTreeBagger和Bagged Senembles的比较

References

[1] Breiman,L.随机森林。机器学习45,pp。5-32,2001。

[2]Meinshausen, N. “Quantile Regression Forests.”机床学习研究,卷。7,2006,第983-999页。