Generate Code to Classify Data in Table
This example shows how to generate code for classifying numeric and categorical data in a table using a binary decision tree model. The trained model in this example identifies categorical predictors in theCategoricalPredictors
property; therefore, the software handles categorical predictors automatically. You do not need to create dummy variables manually for categorical predictors to generate code.
In the general code generation workflow, you can train a classification or regression model on data in a table. You pass arrays (instead of a table) to your entry-point function for prediction, create a table inside the entry-point function, and then pass the table topredict
. For more information on table support in code generation, seeCode Generation for Tables(MATLAB Coder)andTable Limitations for Code Generation(MATLAB Coder).
Train Classification Model
Load thepatients
data set. Create a table that contains numeric predictors of typesingle
anddouble
, categorical predictors of typecategorical
, and the response variableSmoker
of typelogical
. Each row of the table corresponds to a different patient.
loadpatientsAge = single(Age); Weight = single(Weight); Gender = categorical(Gender); SelfAssessedHealthStatus = categorical(SelfAssessedHealthStatus); Tbl = table(Age,Diastolic,Systolic,Weight,Gender,SelfAssessedHealthStatus,Smoker);
Train a classification tree using the data inTbl
.
Mdl = fitctree(Tbl,'Smoker')
Mdl = ClassificationTree PredictorNames: {1x6 cell} ResponseName: 'Smoker' CategoricalPredictors: [5 6] ClassNames: [0 1] ScoreTransform: 'none' NumObservations: 100 Properties, Methods
TheCategoricalPredictors
property value is[5 6]
, which indicates thatMdl
identifies the 5th and 6th predictors ('Gender'
and'SelfAssessedHealthStatus'
) as categorical predictors. To identify any other predictors as categorical predictors, you can specify them by using the'CategoricalPredictors'
name-value argument.
Display the predictor names and their order inMdl
.
Mdl.PredictorNames
ans =1x6 cellColumns 1 through 5 {'Age'} {'Diastolic'} {'Systolic'} {'Weight'} {'Gender'} Column 6 {'SelfAssessedHe...'}
Save Model
Save the tree classifier to a file usingsaveLearnerForCoder
.
saveLearnerForCoder(Mdl,'TreeModel');
saveLearnerForCoder
saves the classifier to the MATLAB® binary fileTreeModel.mat
as a structure array in the current folder.
Define Entry-Point Function
Define the entry-point functionpredictSmoker
, which takes predictor variables as input arguments. Within the function, load the tree classifier by usingloadLearnerForCoder
, create a table from the input arguments, and then pass the classifier and table topredict
.
function[labels,scores] = predictSmoker(age,diastolic,systolic,weight,gender,selfAssessedHealthStatus)%#codegen%PREDICTSMOKER Label new observations using a trained tree model% predictSmoker预测患者是否smokers (1) or nonsmokers% (0) based on their age, diastolic blood pressure, systolic blood% pressure, weight, gender, and self assessed health status. The function% also provides classification scores indicating the likelihood that a% predicted label comes from a particular class (smoker or nonsmoker).mdl = loadLearnerForCoder('TreeModel'); varnames = mdl.PredictorNames; tbl = table(age,diastolic,systolic,weight,gender,selfAssessedHealthStatus,...'VariableNames',varnames); [labels,scores] = predict(mdl,tbl);end
When you create a table inside an entry-point function, you must specify the variable names (for example, by using the'VariableNames'
name-value pair argument oftable
). If your table contains only predictor variables, and the predictors are in the same order as in the table used to train the model, then you can find the predictor variable names inmdl.PredictorNames
.
Generate Code
Generate code forpredictSmoker
by usingcodegen
. Specify the data type and dimensions of the predictor variable input arguments usingcoder.typeof
.
The first input argument of
coder.typeof
specifies the data type of the predictor.The second input argument specifies the upper bound on the number of rows (
Inf
) and columns (1
) in the predictor.The third input argument specifies that the number of rows in the predictor can change at run time but the number of columns is fixed.
ARGS = cell(4,1); ARGS{1} = coder.typeof(Age,[Inf 1],[1 0]); ARGS{2} = coder.typeof(Diastolic,[Inf 1],[1 0]); ARGS{3} = coder.typeof(Systolic,[Inf 1],[1 0]); ARGS{4} = coder.typeof(Weight,[Inf 1],[1 0]); ARGS{5} = coder.typeof(Gender,[Inf 1],[1 0]); ARGS{6} = coder.typeof(SelfAssessedHealthStatus,[Inf 1],[1 0]); codegenpredictSmoker-argsARGS
Code generation successful.
codegen
generates the MEX functionpredictSmoker_mex
with a platform-dependent extension in your current folder.
Verify Generated Code
Verify thatpredict
,predictSmoker
, and the MEX file return the same results for a random sample of 20 patients.
rng('default')% For reproducibility[newTbl,idx] = datasample(Tbl,20); [labels1,scores1] = predict(Mdl,newTbl); [labels2,scores2] = predictSmoker(Age(idx),Diastolic(idx),Systolic(idx),Weight(idx),Gender(idx),SelfAssessedHealthStatus(idx)); [labels3,scores3] = predictSmoker_mex(Age(idx),Diastolic(idx),Systolic(idx),Weight(idx),Gender(idx),SelfAssessedHealthStatus(idx)); verifyMEXlabels = isequal(labels1,labels2,labels3)
verifyMEXlabels =logical1
verifyMEXscores = isequal(scores1,scores2,scores3)
verifyMEXscores =logical1
See Also
codegen
(MATLAB Coder)|coder.typeof
(MATLAB Coder)|loadLearnerForCoder
|saveLearnerForCoder