Main Content

Generate Code to Classify Data in Table

This example shows how to generate code for classifying numeric and categorical data in a table using a binary decision tree model. The trained model in this example identifies categorical predictors in theCategoricalPredictorsproperty; therefore, the software handles categorical predictors automatically. You do not need to create dummy variables manually for categorical predictors to generate code.

In the general code generation workflow, you can train a classification or regression model on data in a table. You pass arrays (instead of a table) to your entry-point function for prediction, create a table inside the entry-point function, and then pass the table topredict. For more information on table support in code generation, seeCode Generation for Tables(MATLAB Coder)andTable Limitations for Code Generation(MATLAB Coder).

Train Classification Model

Load thepatientsdata set. Create a table that contains numeric predictors of typesingleanddouble, categorical predictors of typecategorical, and the response variableSmokerof typelogical. Each row of the table corresponds to a different patient.

loadpatientsAge = single(Age); Weight = single(Weight); Gender = categorical(Gender); SelfAssessedHealthStatus = categorical(SelfAssessedHealthStatus); Tbl = table(Age,Diastolic,Systolic,Weight,Gender,SelfAssessedHealthStatus,Smoker);

Train a classification tree using the data inTbl.

Mdl = fitctree(Tbl,'Smoker')
Mdl = ClassificationTree PredictorNames: {1x6 cell} ResponseName: 'Smoker' CategoricalPredictors: [5 6] ClassNames: [0 1] ScoreTransform: 'none' NumObservations: 100 Properties, Methods

TheCategoricalPredictorsproperty value is[5 6], which indicates thatMdlidentifies the 5th and 6th predictors ('Gender'and'SelfAssessedHealthStatus') as categorical predictors. To identify any other predictors as categorical predictors, you can specify them by using the'CategoricalPredictors'name-value argument.

Display the predictor names and their order inMdl.

Mdl.PredictorNames
ans =1x6 cellColumns 1 through 5 {'Age'} {'Diastolic'} {'Systolic'} {'Weight'} {'Gender'} Column 6 {'SelfAssessedHe...'}

Save Model

Save the tree classifier to a file usingsaveLearnerForCoder.

saveLearnerForCoder(Mdl,'TreeModel');

saveLearnerForCodersaves the classifier to the MATLAB® binary fileTreeModel.matas a structure array in the current folder.

Define Entry-Point Function

Define the entry-point functionpredictSmoker, which takes predictor variables as input arguments. Within the function, load the tree classifier by usingloadLearnerForCoder, create a table from the input arguments, and then pass the classifier and table topredict.

function[labels,scores] = predictSmoker(age,diastolic,systolic,weight,gender,selfAssessedHealthStatus)%#codegen%PREDICTSMOKER Label new observations using a trained tree model% predictSmoker预测患者是否smokers (1) or nonsmokers% (0) based on their age, diastolic blood pressure, systolic blood% pressure, weight, gender, and self assessed health status. The function% also provides classification scores indicating the likelihood that a% predicted label comes from a particular class (smoker or nonsmoker).mdl = loadLearnerForCoder('TreeModel'); varnames = mdl.PredictorNames; tbl = table(age,diastolic,systolic,weight,gender,selfAssessedHealthStatus,...'VariableNames',varnames); [labels,scores] = predict(mdl,tbl);end

When you create a table inside an entry-point function, you must specify the variable names (for example, by using the'VariableNames'name-value pair argument oftable). If your table contains only predictor variables, and the predictors are in the same order as in the table used to train the model, then you can find the predictor variable names inmdl.PredictorNames.

Generate Code

Generate code forpredictSmokerby usingcodegen. Specify the data type and dimensions of the predictor variable input arguments usingcoder.typeof.

  • The first input argument ofcoder.typeofspecifies the data type of the predictor.

  • The second input argument specifies the upper bound on the number of rows (Inf) and columns (1) in the predictor.

  • The third input argument specifies that the number of rows in the predictor can change at run time but the number of columns is fixed.

ARGS = cell(4,1); ARGS{1} = coder.typeof(Age,[Inf 1],[1 0]); ARGS{2} = coder.typeof(Diastolic,[Inf 1],[1 0]); ARGS{3} = coder.typeof(Systolic,[Inf 1],[1 0]); ARGS{4} = coder.typeof(Weight,[Inf 1],[1 0]); ARGS{5} = coder.typeof(Gender,[Inf 1],[1 0]); ARGS{6} = coder.typeof(SelfAssessedHealthStatus,[Inf 1],[1 0]); codegenpredictSmoker-argsARGS
Code generation successful.

codegengenerates the MEX functionpredictSmoker_mexwith a platform-dependent extension in your current folder.

Verify Generated Code

Verify thatpredict,predictSmoker, and the MEX file return the same results for a random sample of 20 patients.

rng('default')% For reproducibility[newTbl,idx] = datasample(Tbl,20); [labels1,scores1] = predict(Mdl,newTbl); [labels2,scores2] = predictSmoker(Age(idx),Diastolic(idx),Systolic(idx),Weight(idx),Gender(idx),SelfAssessedHealthStatus(idx)); [labels3,scores3] = predictSmoker_mex(Age(idx),Diastolic(idx),Systolic(idx),Weight(idx),Gender(idx),SelfAssessedHealthStatus(idx)); verifyMEXlabels = isequal(labels1,labels2,labels3)
verifyMEXlabels =logical1
verifyMEXscores = isequal(scores1,scores2,scores3)
verifyMEXscores =logical1

See Also

(MATLAB Coder)|(MATLAB Coder)||

Related Topics