Main Content

Use Parallel Processing for RegressionTreeBaggerWorkflow

This example shows you how to:

  • Use an ensemble of bagged regression trees to estimate feature importance.

  • Improve computation speed by using parallel computing.

The sample data is a database of 1985 car imports with 205 observations, 25 predictors, and 1 response, which is insurance risk rating, or "symboling." The first 15 variables are numeric and the last 10 are categorical. The symboling index takes integer values from -3 to 3.

Load the sample data and separate it into predictor and response arrays.

loadimports-85; Y = X(:,1); X = X(:,2:end);

Set up the parallel environment to use the default number of workers. The computer that created this example has six cores.

mypool = parpool
Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6). mypool = ProcessPool with properties: Connected: true NumWorkers: 6 Cluster: local AttachedFiles: {} AutoAddClientPath: true IdleTimeout: 30 minutes (30 minutes remaining) SpmdEnabled: true

Set the options to use parallel processing.

paroptions = statset('UseParallel',true);

估计使用l特性的重要性eaf size1and5000trees in parallel. Time the function for comparison purposes.

tic b = TreeBagger(5000,X,Y,'Method','r','OOBVarImp','on',...'cat',16:25,'MinLeafSize',1,“选项”,paroptions); toc
Elapsed time is 9.873065 seconds.

Perform the same computation in serial for timing comparison.

tic b = TreeBagger(5000,X,Y,'Method','r','OOBVarImp','on',...'cat',16:25,'MinLeafSize',1); toc
Elapsed time is 28.092654 seconds.

The results show that computing in parallel takes a fraction of the time it takes to compute serially. Note that the elapsed time can vary depending on your operating system.

See Also

(Parallel Computing Toolbox)||

Related Topics