predictorImportance

Estimates of predictor importance for regression tree

expand all in page

Syntax

imp = predictorImportance(tree)

Description

imp= predictorImportance(tree)computes estimates of predictor importance fortreeby summing changes in the mean squared error due to splits on every predictor and dividing the sum by the number of branch nodes.

Input Arguments

tree

A regression tree created byfitrtree, or by thecompactmethod.

Output Arguments

imp

A row vector with the same number of elements as the number of predictors (columns) intree.X. The entries are the estimates of predictor importance, with0representing the smallest possible importance.

Examples

expand all

Estimate Predictor Importance

Open Live Script

Estimate the predictor importance for all predictor variables in the data.

Load thecarsmalldata set.

loadcarsmall

Grow a regression tree forMPGusingAcceleration,Cylinders,Displacement,Horsepower,Model_Year, andWeightas predictors.

X = [Acceleration Cylinders Displacement Horsepower Model_Year Weight]; tree = fitrtree(X,MPG);

Estimate the predictor importance for all predictor variables.

imp = predictorImportance(tree)

imp =1×60.0647 0.1068 0.1155 0.1411 0.3348 2.6565

Weight, the last predictor, has the most impact on mileage. The predictor with the minimal impact on making predictions is the first variable, which isAcceleration.

Predictor Importance and Surrogate Splits

Open Live Script

Estimate the predictor importance for all variables in the data and where the regression tree contains surrogate splits.

Load thecarsmalldata set.

loadcarsmall

Grow a regression tree forMPGusingAcceleration,Cylinders,Displacement,Horsepower,Model_Year, andWeightas predictors. Specify to identify surrogate splits.

X = [Acceleration Cylinders Displacement Horsepower Model_Year Weight]; tree = fitrtree(X,MPG,'Surrogate','on');

Estimate the predictor importance for all predictor variables.

imp = predictorImportance(tree)

imp =1×61.0449 2.4560 2.5570 2.5788 2.0832 2.8938

Comparingimpto the results inEstimate Predictor Importance,Weightstill has the most impact on mileage, butCylindersis the fourth most important predictor.

Unbiased Predictor Importance Estimates

Open Live Script

Load thecarsmalldata set. Consider a model that predicts the mean fuel economy of a car given its acceleration, number of cylinders, engine displacement, horsepower, manufacturer, model year, and weight. ConsiderCylinders,Mfg, andModel_Yearas categorical variables.

loadcarsmallCylinders = categorical(Cylinders); Mfg = categorical(cellstr(Mfg)); Model_Year = categorical(Model_Year); X = table(Acceleration,Cylinders,Displacement,Horsepower,Mfg,...Model_Year,Weight,MPG);

Display the number of categories represented in the categorical variables.

numCylinders = numel(categories(Cylinders))

numCylinders = 3

numMfg = numel(categories(Mfg))

numMfg = 28

numModelYear = numel(categories(Model_Year))

numModelYear = 3

Because there are 3 categories only inCylindersandModel_Year, the standard CART, predictor-splitting algorithm prefers splitting a continuous predictor over these two variables.

Train a regression tree using the entire data set. To grow unbiased trees, specify usage of the curvature test for splitting predictors. Because there are missing values in the data, specify usage of surrogate splits.

Mdl = fitrtree(X,“英里”,'PredictorSelection','curvature','Surrogate','on');

Estimate predictor importance values by summing changes in the risk due to splits on every predictor and dividing the sum by the number of branch nodes. Compare the estimates using a bar graph.

imp = predictorImportance(Mdl); figure; bar(imp); title('Predictor Importance Estimates'); ylabel('Estimates'); xlabel('Predictors'); h = gca; h.XTickLabel = Mdl.PredictorNames; h.XTickLabelRotation = 45; h.TickLabelInterpreter ='none';

Figure contains an axes object. The axes object with title Predictor Importance Estimates contains an object of type bar.

In this case,Displacementis the most important predictor, followed byHorsepower.

More About

expand all

Predictor Importance

predictorImportancecomputes importance measures of the predictors in a tree by summing changes in the node risk due to splits on every predictor, and then dividing the sum by the total number of branch nodes. The change in the node risk is the difference between the risk for the parent node and the total risk for the two children. For example, if a tree splits a parent node (for example, node 1) into two child nodes (for example, nodes 2 and 3), thenpredictorImportanceincreases the importance of the split predictor by

(R₁–R₂–R₃)/N_branch,

whereR_iis node risk of nodei, andN_branchis the total number of branch nodes. Anode riskis defined as a node error weighted by the node probability:

R_i=P_iE_i,

whereP_iis the node probability of nodei, andE_iis the mean squared error of nodei.

The estimates of predictor importance depend on whether you use surrogate splits for training.

If you use surrogate splits,predictorImportancesums the changes in the node risk over all splits at each branch node, including surrogate splits. If you do not use surrogate splits, then the function takes the sum over the best splits found at each branch node.
Estimates of predictor importance do not depend on the order of predictors if you use surrogate splits, but do depend on the order if you do not use surrogate splits.

If you use surrogate splits,predictorImportancecomputes estimates before the tree is reduced by pruning (or merging leaves). If you do not use surrogate splits,predictorImportancecomputes estimates after the tree is reduced by pruning. Therefore, pruning affects the predictor importance for a tree grown without surrogate splits, and does not affect the predictor importance for a tree grown with surrogate splits.

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, seeRun MATLAB Functions on a GPU(Parallel Computing Toolbox).

predictorImportance

Syntax

Description

Input Arguments

Output Arguments

Examples

Estimate Predictor Importance

Predictor Importance and Surrogate Splits

Unbiased Predictor Importance Estimates

More About

Predictor Importance

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

See Also

Topics

predictorImportance

Syntax

Description

Input Arguments

Output Arguments

Examples

Estimate Predictor Importance

Predictor Importance and Surrogate Splits

Unbiased Predictor Importance Estimates

More About

Predictor Importance

Extended Capabilities

GPU ArraysAccelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

See Also

Topics

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.