CompactTreeBagger class

Compact ensemble of decision trees grown by bootstrap aggregation

expand all in page

Description

CompactTreeBaggerclass is a lightweight class that contains the trees grown usingTreeBagger.CompactTreeBaggerdoes not preserve any information about howTreeBaggergrew the decision trees. It does not contain the input data used for growing trees, nor does it contain training parameters such as minimal leaf size or number of variables sampled for each decision split at random. You can only useCompactTreeBaggerfor predicting the response of the trained ensemble given new dataX, and other related functions.

CompactTreeBaggerlets you save the trained ensemble to disk, or use it in any other way, while discarding training data and various parameters of the training configuration irrelevant for predicting response of the fully grown ensemble. This reduces storage and memory requirements, especially for ensembles trained on large data sets.

Construction

CompactTreeBagger

Create CompactTreeBagger object

CMdl= compact(Mdl)creates a compact version ofMdl, aTreeBaggermodel object. You can predict regressions usingCMdlexactly as you can usingMdl. However, sinceCMdldoes not contain training data, you cannot perform some actions, such as make out-of-bag predictions usingoobPredict.

Object Functions

`combine`	Combine two ensembles
`error`	Error (misclassification probability or MSE)
`margin`	分类margin
`mdsprox`	Multidimensional scaling of proximity matrix
`meanMargin`	Mean classification margin
`outlierMeasure`	Outlier measure for data
`partialDependence`	Compute partial dependence
`plotPartialDependence`	Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots
`predict`	Predict responses using ensemble of bagged decision trees
`proximity`	Proximity matrix for data
`setDefaultYfit`	Set default value for`predict`

Properties

`ClassNames`	The`ClassNames`property is a cell array containing the class names for the response variable`Y`supplied to`TreeBagger`. This property is empty for regression trees.
`DefaultYfit`	The`DefaultYfit`property controls what predicted value`CompactTreeBagger`returns when no prediction is possible, for example when the`predict`method needs to predict for an observation which has only false values in the matrix supplied through`'useifort'`argument. For classification, you can set this property to either`''`or`'MostPopular'`. If you choose`'MostPopular'`(default), the property value becomes the name of the most probable class in the training data. 对于回归,可以将此属性设置为numeric scalar. The default is the mean of the response for the training data.
`DeltaCriterionDecisionSplit`	The`DeltaCriterionDecisionSplit`property is a numeric array of size 1-by-`Nvars`of changes in the split criterion summed over splits on each variable, averaged across the entire ensemble of grown trees.
`Method`	The`Method`property is`'classification'`for classification ensembles and`'regression'`for regression ensembles.
`NumPredictorSplit`	The`NumPredictorSplit`property is a numeric array of size 1-by-Nvars, where every element gives a number of splits on this predictor summed over all trees.
`NumTrees`	The`NumTrees`property is a scalar equal to the number of decision trees in the ensemble.
`PredictorNames`	The`PredictorNames`property is a cell array containing the names of the predictor variables (features). These names are taken from the optional`'names'`parameter that supplied to`TreeBagger`. The default names are`'x1'`,`'x2'`, etc.
`SurrogateAssociation`	The`SurrogateAssociation`property is a matrix of sizeNvars-by-Nvarswith predictive measures of variable association, averaged across the entire ensemble of grown trees. If you grew the ensemble setting`'surrogate'`to`'on'`, this matrix for each tree is filled with predictive measures of association averaged over the surrogate splits. If you grew the ensemble setting`'surrogate'`to`'off'`(default),`SurrogateAssociation`is diagonal.
`Trees`	The`Trees`property is a cell array of size`NumTrees`-by-1 containing the trees in the ensemble.

Examples

collapse all

Reduce Size of Bag of Trees

Open Live Script

Create a compact bag of trees for efficiently making predictions on new data.

Load theionospheredata set.

loadionosphere

Train a bag of 100 classification trees using all measurements and theAdaBoostM1method.

Mdl = TreeBagger(100,X,Y,'Method','classification')

Mdl = TreeBagger Ensemble with 100 bagged decision trees: Training X: [351x34] Training Y: [351x1] Method: classification NumPredictors: 34 NumPredictorsToSample: 6 MinLeafSize: 1 InBagFraction: 1 SampleWithReplacement: 1 ComputeOOBPrediction: 0 ComputeOOBPredictorImportance: 0 Proximity: [] ClassNames: 'b' 'g' Properties, Methods

Mdlis aTreeBaggermodel object that contains the training data, among other things.

Create a compact version ofMdl.

CMdl = compact(Mdl)

CMdl = CompactTreeBagger Ensemble with 100 bagged decision trees: Method: classification NumPredictors: 34 ClassNames: 'b' 'g' Properties, Methods

CMdlis aCompactTreeBaggermodel object.CMdlis almost the same asMdl. One exception is that it does not store the training data.

Compare the amounts of space consumed byMdlandCMdl.

mdlInfo = whos('Mdl'); cMdlInfo = whos('CMdl'); [mdlInfo.bytes cMdlInfo.bytes]

ans =1×21115742 976936

Mdlconsumes more space thanCMdl.

CMdl.Treesstores the trained classification trees (CompactClassificationTreemodel objects) that composeMdl.

Display a graph of the first tree in the compact model.

view(CMdl.Trees{1},'Mode','graph');

By default,TreeBagger深树生长。

Predict the label of the mean ofXusing the compact ensemble.

predMeanX = predict(CMdl,mean(X))

predMeanX =1x1 cell array{'g'}

Copy Semantics

Value. To learn how this affects your use of the class, seeComparing Handle and Value Classesin the MATLAB^®Object-Oriented Programming documentation.

Tips

TheTreesproperty ofCMdlstores a cell vector ofCMdl.NumTreesCompactClassificationTreeorCompactRegressionTreemodel objects. For a textual or graphical display of treetin the cell vector, enter