Main Content

CompactTreeBagger class

Compact ensemble of decision trees grown by bootstrap aggregation

Description

CompactTreeBaggerclass is a lightweight class that contains the trees grown usingTreeBagger.CompactTreeBaggerdoes not preserve any information about howTreeBaggergrew the decision trees. It does not contain the input data used for growing trees, nor does it contain training parameters such as minimal leaf size or number of variables sampled for each decision split at random. You can only useCompactTreeBaggerfor predicting the response of the trained ensemble given new dataX, and other related functions.

CompactTreeBaggerlets you save the trained ensemble to disk, or use it in any other way, while discarding training data and various parameters of the training configuration irrelevant for predicting response of the fully grown ensemble. This reduces storage and memory requirements, especially for ensembles trained on large data sets.

Construction

CompactTreeBagger Create CompactTreeBagger object

CMdl= compact(Mdl)creates a compact version ofMdl, aTreeBaggermodel object. You can predict regressions usingCMdlexactly as you can usingMdl. However, sinceCMdldoes not contain training data, you cannot perform some actions, such as make out-of-bag predictions usingoobPredict.

Object Functions

combine Combine two ensembles
error Error (misclassification probability or MSE)
margin 分类margin
mdsprox Multidimensional scaling of proximity matrix
meanMargin Mean classification margin
outlierMeasure Outlier measure for data
partialDependence Compute partial dependence
plotPartialDependence Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots
predict Predict responses using ensemble of bagged decision trees
proximity Proximity matrix for data
setDefaultYfit Set default value forpredict

Properties

ClassNames

TheClassNamesproperty is a cell array containing the class names for the response variableYsupplied toTreeBagger. This property is empty for regression trees.

DefaultYfit

TheDefaultYfitproperty controls what predicted valueCompactTreeBaggerreturns when no prediction is possible, for example when thepredictmethod needs to predict for an observation which has only false values in the matrix supplied through'useifort'argument.

For classification, you can set this property to either''or'MostPopular'. If you choose'MostPopular'(default), the property value becomes the name of the most probable class in the training data.

对于回归,可以将此属性设置为numeric scalar. The default is the mean of the response for the training data.

DeltaCriterionDecisionSplit

TheDeltaCriterionDecisionSplitproperty is a numeric array of size 1-by-Nvarsof changes in the split criterion summed over splits on each variable, averaged across the entire ensemble of grown trees.

Method

TheMethodproperty is'classification'for classification ensembles and'regression'for regression ensembles.

NumPredictorSplit

TheNumPredictorSplitproperty is a numeric array of size 1-by-Nvars, where every element gives a number of splits on this predictor summed over all trees.

NumTrees

TheNumTreesproperty is a scalar equal to the number of decision trees in the ensemble.

PredictorNames

ThePredictorNamesproperty is a cell array containing the names of the predictor variables (features). These names are taken from the optional'names'parameter that supplied toTreeBagger. The default names are'x1','x2', etc.

SurrogateAssociation

TheSurrogateAssociationproperty is a matrix of sizeNvars-by-Nvarswith predictive measures of variable association, averaged across the entire ensemble of grown trees. If you grew the ensemble setting'surrogate'to'on', this matrix for each tree is filled with predictive measures of association averaged over the surrogate splits. If you grew the ensemble setting'surrogate'to'off'(default),SurrogateAssociationis diagonal.

Trees

TheTreesproperty is a cell array of sizeNumTrees-by-1 containing the trees in the ensemble.

Examples

collapse all

Create a compact bag of trees for efficiently making predictions on new data.

Load theionospheredata set.

loadionosphere

Train a bag of 100 classification trees using all measurements and theAdaBoostM1method.

Mdl = TreeBagger(100,X,Y,'Method','classification')
Mdl = TreeBagger Ensemble with 100 bagged decision trees: Training X: [351x34] Training Y: [351x1] Method: classification NumPredictors: 34 NumPredictorsToSample: 6 MinLeafSize: 1 InBagFraction: 1 SampleWithReplacement: 1 ComputeOOBPrediction: 0 ComputeOOBPredictorImportance: 0 Proximity: [] ClassNames: 'b' 'g' Properties, Methods

Mdlis aTreeBaggermodel object that contains the training data, among other things.

Create a compact version ofMdl.

CMdl = compact(Mdl)
CMdl = CompactTreeBagger Ensemble with 100 bagged decision trees: Method: classification NumPredictors: 34 ClassNames: 'b' 'g' Properties, Methods

CMdlis aCompactTreeBaggermodel object.CMdlis almost the same asMdl. One exception is that it does not store the training data.

Compare the amounts of space consumed byMdlandCMdl.

mdlInfo = whos('Mdl'); cMdlInfo = whos('CMdl'); [mdlInfo.bytes cMdlInfo.bytes]
ans =1×21115742 976936

Mdlconsumes more space thanCMdl.

CMdl.Treesstores the trained classification trees (CompactClassificationTreemodel objects) that composeMdl.

Display a graph of the first tree in the compact model.

view(CMdl.Trees{1},'Mode','graph');

By default,TreeBagger深树生长。

Predict the label of the mean ofXusing the compact ensemble.

predMeanX = predict(CMdl,mean(X))
predMeanX =1x1 cell array{'g'}

Copy Semantics

Value. To learn how this affects your use of the class, seeComparing Handle and Value Classesin the MATLAB®Object-Oriented Programming documentation.

Tips

TheTreesproperty ofCMdlstores a cell vector ofCMdl.NumTreesCompactClassificationTreeorCompactRegressionTreemodel objects. For a textual or graphical display of treetin the cell vector, enter

view(CMdl.Trees{t})