Main Content

FeatureSelectionNCARegression class

Feature selection for regression using neighborhood component analysis (NCA)

Description

FeatureSelectionNCARegressioncontains the data, fitting information, feature weights, and other model parameters of a neighborhood component analysis (NCA) model.fsrncalearns the feature weights using a diagonal adaptation of NCA and returns an instance ofFeatureSelectionNCARegressionobject. The function achieves feature selection by regularizing the feature weights.

Construction

Create aFeatureSelectionNCAClassificationobject usingfsrnca.

Properties

expand all

Number of observations in the training data (XandY) after removingNaNorInfvalues, stored as a scalar.

Data Types:double

Model parameters used for training the model, stored as a structure.

You can access the fields ofModelParametersusing dot notation.

For example, for a FeatureSelectionNCARegression object namedmdl, you can access theLossFunctionvalue usingmdl.ModelParameters.LossFunction.

Data Types:struct

Regularization parameter used for training this model, stored as a scalar. Fornobservations, the bestLambdavalue that minimizes the generalization error of the NCA model is expected to be a multiple of 1/n.

Data Types:double

Name of the fitting method used to fit this model, stored as one of the following:

  • 'exact'— Perform fitting using all of the data.

  • 'none'— No fitting. Use this option to evaluate the generalization error of the NCA model using the initial feature weights supplied in the call tofsrnca.

  • 'average'— The software divides the data into partitions (subsets), fits each partition using theexactmethod, and returns the average of the feature weights. You can specify the number of partitions using theNumPartitionsname-value pair argument.

Name of the solver used to fit this model, stored as one of the following:

  • 'lbfgs'— Limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm

  • 'sgd'— Stochastic gradient descent (SGD) algorithm

  • 'minibatch-lbfgs'— stochastic gradient descent with LBFGS algorithm applied to mini-batches

Relative convergence tolerance on the gradient norm for the'lbfgs'and'minibatch-lbfgs'solvers, stored as a positive scalar value.

Data Types:double

Maximum number of iterations for optimization, stored as a positive integer value.

Data Types:double

Maximum number of passes for'sgd'and'minibatch-lbfgs'solvers. Every pass processes all of the observations in the data.

Data Types:double

Initial learning rate for'sgd'and'minibatch-lbfgs'solvers. The learning rate decays over iterations starting at the value specified forInitialLearningRate.

Use theNumTuningIterationsandTuningSubsetSizeto control the automatic tuning of initial learning rate in the call tofsrnca.

Data Types:double

Verbosity level indicator, stored as a nonnegative integer. Possible values are:

  • 0 — No convergence summary

  • 1 — Convergence summary, including norm of gradient and objective function value

  • >1 — More convergence information, depending on the fitting algorithm. When you use the'minibatch-lbfgs'solver and verbosity level > 1, the convergence information includes the iteration log from intermediate mini-batch LBFGS fits.

Data Types:double

Initial feature weights, stored as ap1的向量of positive real scalars, wherepis the number of predictors inX.

Data Types:double

Feature weights, stored as ap1的向量of real scalar values, wherepis the number of predictors inX.

For'FitMethod'equal to'average',FeatureWeightsis ap-by-mmatrix, wheremis the number of partitions specified via the'NumPartitions'name-value pair argument in the call tofsrnca.

The absolute value ofFeatureWeights(k)is a measure of the importance of predictork. IfFeatureWeights(k)is close to 0, then this indicates that predictorkdoes not influence the response inY.

Data Types:double

Fit information, stored as a structure with the following fields.

Field Name Meaning
Iteration Iteration index
Objective 正规化的目的function for minimization
UnregularizedObjective Unregularized objective function for minimization
Gradient Gradient of regularized objective function for minimization
  • For classification,UnregularizedObjectiverepresents the negative of the leave-one-out accuracy of the NCA classifier on the training data.

  • For regression,UnregularizedObjectiverepresents the leave-one-out loss between the true response and the predicted response when using the NCA regression model.

  • For the'lbfgs'solver,Gradientis the final gradient. For the'sgd'and'minibatch-lbfgs'solvers,Gradientis the final mini-batch gradient.

  • IfFitMethodis'average', thenFitInfois anm1结构数组,在那里mis the number of partitions specified via the'NumPartitions'name-value pair argument.

You can access the fields ofFitInfousing dot notation. For example, for a FeatureSelectionNCARegressionobject namedmdl, you can access theObjectivefield usingmdl.FitInfo.Objective.

Data Types:struct

Predictor means, stored as ap1的向量for standardized training data. In this case, thepredictmethod centers predictor matrixXby subtracting the respective element ofMufrom every column.

If data is not standardized during training, thenMuis empty.

Data Types:double

Predictor standard deviations, stored as ap1的向量for standardized training data. In this case, thepredictmethod scales predictor matrixXby dividing every column by the respective element ofSigmaafter centering the data usingMu.

If data is not standardized during training, thenSigmais empty.

Data Types:double

预测的值用来训练这个model, stored as ann-by-pmatrix.nis the number of observations andpis the number of predictor variables in the training data.

Data Types:double

Response values used to train this model, stored as a numeric vector of sizen, where n is the number of observations.

Data Types:double

Observation weights used to train this model, stored as a numeric vector of sizen. The sum of observation weights isn.

Data Types:double

Methods

loss Evaluate accuracy of learned feature weights on test data
predict Predict responses using neighborhood component analysis (NCA) regression model
refit Refit neighborhood component analysis (NCA) model for regression

Examples

collapse all

Load the sample data.

loadimports-85

前15列包含连续作表语用tor variables, whereas the 16th column contains the response variable, which is the price of a car. Define the variables for the neighborhood component analysis model.

Predictors = X(:,1:15); Y = X(:,16);

Fit a neighborhood component analysis (NCA) model for regression to detect the relevant features.

mdl = fsrnca(Predictors,Y);

The returned NCA model,mdl, is aFeatureSelectionNCARegressionobject. This object stores information about the training data, model, and optimization. You can access the object properties, such as the feature weights, using dot notation.

Plot the feature weights.

figure() plot(mdl.FeatureWeights,'ro') xlabel('Feature Index') ylabel('Feature Weight') gridon

Figure contains an axes object. The axes object contains an object of type line.

The weights of the irrelevant features are zero. The'Verbose',1option in the call tofsrncadisplays the optimization information on the command line. You can also visualize the optimization process by plotting the objective function versus the iteration number.

figure() plot(mdl.FitInfo.Iteration,mdl.FitInfo.Objective,'ro-') gridonxlabel('Iteration Number') ylabel('Objective')

Figure contains an axes object. The axes object contains an object of type line.

TheModelParametersproperty is astructthat contains more information about the model. You can access the fields of this property using dot notation. For example, see if the data was standardized or not.

mdl.ModelParameters.Standardize
ans =logical0

0means that the data was not standardized before fitting the NCA model. You can standardize the predictors when they are on very different scales using the'Standardize',1name-value pair argument in the call tofsrnca.

Copy Semantics

Value. To learn how value classes affect copy operations, seeCopying Objects.

Version History

Introduced in R2016b