Main Content

Feature Selection and Feature Transformation Using Classification Learner App

Investigate Features in the Scatter Plot

在分类学习者,试图确定预测ors that separate classes well by plotting different pairs of predictors on the scatter plot. The plot can help you investigate features to include or exclude. You can visualize training data and misclassified points on the scatter plot.

Before you train a classifier, the scatter plot shows the data. If you have trained a classifier, the scatter plot shows model prediction results. Switch to plotting only the data by selectingDatain thePlotcontrols.

  • Choose features to plot using theXandYlists underPredictors.

  • Look for predictors that separate classes well. For example, plotting thefisheririsdata, you can see that sepal length and sepal width separate one of the classes well (setosa). You need to plot other predictors to see if you can separate the other two classes.

    Scatter plot of the Fisher iris data

  • Show or hide specific classes using the check boxes underShow.

  • Change the stacking order of the plotted classes by selecting a class underClassesand then clickingMove to Front.

  • Investigate finer details by zooming in and out and panning across the plot. To enable zooming or panning, hover the mouse over the scatter plot and click the corresponding button on the toolbar that appears above the top right of the plot.

  • If you identify predictors that are not useful for separating out classes, then try usingFeature Selectionto remove them and train classifiers including only the most useful predictors. SeeSelect Features to Include.

After you train a classifier, the scatter plot shows model prediction results. You can show or hide correct or incorrect results and visualize the results by class. SeePlot Classifier Results.

You can export the scatter plots you create in the app to figures. SeeExport Plots in Classification Learner App.

Select Features to Include

In Classification Learner, you can specify different features (or predictors) to include in the model. See if you can improve models by removing features with low predictive power. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors.

You can determine which important predictors to include by using different feature ranking algorithms. After you select a feature ranking algorithm, the app displays a plot of the sorted feature importance scores, where larger scores (includingInfs) indicate greater feature importance. The app also displays the ranked features and their scores in a table.

To use feature ranking algorithms in Classification Learner, clickFeature Selectionin theOptionssection of theClassification Learnertab. The app opens aDefault Feature Selectiontab, where you can choose between these algorithms:

Feature Ranking Algorithm Supported Data Type Description
MRMR Categorical and continuous features

Rank features sequentially using theMinimum Redundancy Maximum Relevance (MRMR) Algorithm.

For more information, seefscmrmr.

Chi2 Categorical and continuous features

Examine whether each predictor variable is independent of the response variable by using individual chi-square tests, and then rank features using thep卡方检验统计数据的值。分数correspond to–log(p).

For more information, seefscchi2.

ReliefF Either all categorical or all continuous features

Rank features using theReliefFalgorithm. This algorithm works best for estimating feature importance for distance-based supervised models that use pairwise distances between observations to predict the response.

For more information, seerelieff.

ANOVA Categorical and continuous features

Perform one-way analysis of variance for each predictor variable, grouped by class, and then rank features using thep-values. For each predictor variable, the app tests the hypothesis that the predictor values grouped by the response classes are drawn from populations with the same mean against the alternative hypothesis that the population means are not all the same. Scores correspond to–log(p).

For more information, seeanova1.

Kruskal Wallis Categorical and continuous features

Rank features using thep-values returned by theKruskal-Wallis Test. For each predictor variable, the app tests the hypothesis that the predictor values grouped by the response classes are drawn from populations with the same median against the alternative hypothesis that the population medians are not all the same. Scores correspond to–log(p).

For more information, seekruskalwallis.

Choose between selecting the highest ranked features and selecting individual features.

  • ChooseSelect highest ranked featuresto avoid bias in validation metrics. For example, if you use a cross-validation scheme, then for each training fold, the app performs feature selection before training a model. Different folds can choose different predictors as the highest ranked features.

  • ChooseSelect individual featuresto include specific features in model training. If you use a cross-validation scheme, then the app uses the same features across all training folds.

When you are done making your feature selections, clickSave and Apply. Your selections affect all draft models in theModelspane and will be applied to new draft models that you create using the gallery in theModelssection of theClassification Learnertab.

To select features for a single draft model, open and edit the model summary. Click the model in theModelspane, and then click the modelSummarytab (if necessary). TheSummarytab includes an editableFeature Selectionsection.

你训练模型后,Feature Selectionsection of the modelSummarytab lists the features that were used to train the full model (that is, the model trained using training and validation data). To learn more about how Classification Learner applies feature selection to your data, generate code for your trained classifier.

For an example using feature selection, seeTrain Decision Trees Using Classification Learner App.

Transform Features with PCA in Classification Learner

Use principal component analysis (PCA) to reduce the dimensionality of the predictor space. Reducing the dimensionality can create classification models in Classification Learner that help prevent overfitting. PCA linearly transforms predictors in order to remove redundant dimensions, and generates a new set of variables called principal components.

  1. On theClassification Learnertab, in theOptionssection, selectPCA.

  2. In the Default PCA Options dialog box, select theEnable PCAcheck box, and then clickSave and Apply.

    The app applies the changes to all existing draft models in theModelspane and to new draft models that you create using the gallery in theModelssection of theClassification Learnertab.

  3. When you next train a model using theTrain Allbutton, thepcafunction transforms your selected features before training the classifier.

  4. By default, PCA keeps only the components that explain 95% of the variance. In the Default PCA Options dialog box, you can change the percentage of variance to explain by selecting theExplained variancevalue. A higher value risks overfitting, while a lower value risks removing useful dimensions.

  5. If you want to limit the number of PCA components manually, selectSpecify number of componentsin theComponent reduction criterionlist. Select theNumber of numeric componentsvalue. The number of components cannot be larger than the number of numeric predictors. PCA is not applied to categorical predictors.

You can check PCA options for trained models in thePCAsection of theSummarytab. Click on a trained model in theModelspane, and then click the modelSummarytab (if necessary). For example:

PCA is keeping enough components to explain 95% variance. After training, 2 components were kept. Explained variance per component (in order): 92.5%, 5.3%, 1.7%, 0.5%
Check the explained variance percentages to decide whether to change the number of components.

To learn more about how Classification Learner applies PCA to your data, generate code for your trained classifier. For more information on PCA, see thepcafunction.

Investigate Features in the Parallel Coordinates Plot

To investigate features to include or exclude, use the parallel coordinates plot. You can visualize high-dimensional data on a single plot to see 2-D patterns. The plot can help you understand relationships between features and identify useful predictors for separating classes. You can visualize training data and misclassified points on the parallel coordinates plot. When you plot classifier results, misclassified points have dashed lines.

  1. On theClassification Learnertab, in thePlots部分,点击the arrow to open the gallery, and then clickParallel Coordinatesin theValidation Resultsgroup.

  2. On the plot, drag theXtick labels to reorder the predictors. Changing the order can help you identify predictors that separate classes well.

  3. To specify which predictors to plot, use thePredictorscheck boxes. A good practice is to plot a few predictors at a time. If your data has many predictors, the plot shows the first 10 predictors by default.

  4. If the predictors have significantly different scales, scale the data for easier visualization. Try different options in theScalinglist:

    • Nonedisplays raw data along coordinate rulers that have the same minimum and maximum limits.

    • Rangedisplays raw data along coordinate rulers that have independent minimum and maximum limits.

    • Z-Scoredisplays z-scores (with a mean of 0 and a standard deviation of 1) along each coordinate ruler.

    • Zero Meandisplays data centered to have a mean of 0 along each coordinate ruler.

    • Unit Variancedisplays values scaled by standard deviation along each coordinate ruler.

    • L2 Normdisplays 2-norm values along each coordinate ruler.

  5. If you identify predictors that are not useful for separating out classes, useFeature Selectionto remove them and train classifiers including only the most useful predictors. SeeSelect Features to Include.

The plot of thefisheririsdata shows the petal length and petal width features separate the classes best.

Parallel coordinates plot displaying classifier results for the Fisher iris data

For more information, seeparallelplot.

You can export the parallel coordinates plots you create in the app to figures. SeeExport Plots in Classification Learner App.

Related Topics