FeatureSelectionNCARegression class
Feature selection for regression using neighborhood component analysis (NCA)
Description
FeatureSelectionNCARegression
contains the data, fitting information, feature weights, and other model parameters of a neighborhood component analysis (NCA) model.fsrnca
learns the feature weights using a diagonal adaptation of NCA and returns an instance ofFeatureSelectionNCARegression
object. The function achieves feature selection by regularizing the feature weights.
Construction
Create aFeatureSelectionNCAClassification
object usingfsrnca
.
Properties
NumObservations
—Number of observations in the training data
scalar
Number of observations in the training data (X
andY
) after removingNaN
orInf
values, stored as a scalar.
Data Types:double
ModelParameters
—Model parameters
structure
Model parameters used for training the model, stored as a structure.
You can access the fields ofModelParameters
using dot notation.
For example, for a FeatureSelectionNCARegression object namedmdl
, you can access theLossFunction
value usingmdl.ModelParameters.LossFunction
.
Data Types:struct
Lambda
—Regularization parameter
scalar
Regularization parameter used for training this model, stored as a scalar. Fornobservations, the bestLambda
value that minimizes the generalization error of the NCA model is expected to be a multiple of 1/n.
Data Types:double
FitMethod
—Name of the fitting method used to fit this model
'exact'
|'none'
|'average'
Name of the fitting method used to fit this model, stored as one of the following:
'exact'
— Perform fitting using all of the data.'none'
— No fitting. Use this option to evaluate the generalization error of the NCA model using the initial feature weights supplied in the call tofsrnca
.'average'
— The software divides the data into partitions (subsets), fits each partition using theexact
method, and returns the average of the feature weights. You can specify the number of partitions using theNumPartitions
name-value pair argument.
Solver
—Name of the solver used to fit this model
'lbfgs'
|'sgd'
|'minibatch-lbfgs'
Name of the solver used to fit this model, stored as one of the following:
'lbfgs'
— Limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm'sgd'
— Stochastic gradient descent (SGD) algorithm'minibatch-lbfgs'
— stochastic gradient descent with LBFGS algorithm applied to mini-batches
GradientTolerance
—Relative convergence tolerance on gradient norm
positive scalar
Relative convergence tolerance on the gradient norm for the'lbfgs'
and'minibatch-lbfgs'
solvers, stored as a positive scalar value.
Data Types:double
IterationLimit
—Maximum number of iterations for optimization
positive integer
Maximum number of iterations for optimization, stored as a positive integer value.
Data Types:double
PassLimit
—Maximum number of passes
positive integer
Maximum number of passes for'sgd'
and'minibatch-lbfgs'
solvers. Every pass processes all of the observations in the data.
Data Types:double
InitialLearningRate
—Initial learning rate
positive real scalar
Initial learning rate for'sgd'
and'minibatch-lbfgs'
solvers. The learning rate decays over iterations starting at the value specified forInitialLearningRate
.
Use theNumTuningIterations
andTuningSubsetSize
to control the automatic tuning of initial learning rate in the call tofsrnca
.
Data Types:double
Verbose
—Verbosity level indicator
nonnegative integer
Verbosity level indicator, stored as a nonnegative integer. Possible values are:
0 — No convergence summary
1 — Convergence summary, including norm of gradient and objective function value
>1 — More convergence information, depending on the fitting algorithm. When you use the
'minibatch-lbfgs'
solver and verbosity level > 1, the convergence information includes the iteration log from intermediate mini-batch LBFGS fits.
Data Types:double
InitialFeatureWeights
—Initial feature weights
p1的向量of positive real scalars
Initial feature weights, stored as ap1的向量of positive real scalars, wherepis the number of predictors inX
.
Data Types:double
FeatureWeights
—Feature weights
p1的向量of real scalar values
Feature weights, stored as ap1的向量of real scalar values, wherepis the number of predictors inX
.
For'FitMethod'
equal to'average'
,FeatureWeights
is ap-by-mmatrix, wheremis the number of partitions specified via the'NumPartitions'
name-value pair argument in the call tofsrnca
.
The absolute value ofFeatureWeights(k)
is a measure of the importance of predictork
. IfFeatureWeights(k)
is close to 0, then this indicates that predictork
does not influence the response inY
.
Data Types:double
FitInfo
—Fit information
structure
Fit information, stored as a structure with the following fields.
Field Name | Meaning |
---|---|
Iteration |
Iteration index |
Objective |
正规化的目的function for minimization |
UnregularizedObjective |
Unregularized objective function for minimization |
Gradient |
Gradient of regularized objective function for minimization |
For classification,
UnregularizedObjective
represents the negative of the leave-one-out accuracy of the NCA classifier on the training data.For regression,
UnregularizedObjective
represents the leave-one-out loss between the true response and the predicted response when using the NCA regression model.For the
'lbfgs'
solver,Gradient
is the final gradient. For the'sgd'
and'minibatch-lbfgs'
solvers,Gradient
is the final mini-batch gradient.If
FitMethod
is'average'
, thenFitInfo
is anm1结构数组,在那里mis the number of partitions specified via the'NumPartitions'
name-value pair argument.
You can access the fields ofFitInfo
using dot notation. For example, for a FeatureSelectionNCARegressionobject namedmdl
, you can access theObjective
field usingmdl.FitInfo.Objective
.
Data Types:struct
Mu
—Predictor means
p1的向量|[]
Predictor means, stored as ap1的向量for standardized training data. In this case, thepredict
method centers predictor matrixX
by subtracting the respective element ofMu
from every column.
If data is not standardized during training, thenMu
is empty.
Data Types:double
Sigma
—Predictor standard deviations
p1的向量|[]
Predictor standard deviations, stored as ap1的向量for standardized training data. In this case, thepredict
method scales predictor matrixX
by dividing every column by the respective element ofSigma
after centering the data usingMu
.
If data is not standardized during training, thenSigma
is empty.
Data Types:double
X
—Predictor values
n-by-pmatrix
预测的值用来训练这个model, stored as ann-by-pmatrix.nis the number of observations andpis the number of predictor variables in the training data.
Data Types:double
Y
—Response values
numeric vector of sizen
Response values used to train this model, stored as a numeric vector of sizen, where n is the number of observations.
Data Types:double
W
—Observation weights
numeric vector of sizen
Observation weights used to train this model, stored as a numeric vector of sizen. The sum of observation weights isn.
Data Types:double
Methods
loss | Evaluate accuracy of learned feature weights on test data |
predict | Predict responses using neighborhood component analysis (NCA) regression model |
refit | Refit neighborhood component analysis (NCA) model for regression |
Examples
ExploreFeatureSelectionNCARegression
Object
Load the sample data.
loadimports-85
前15列包含连续作表语用tor variables, whereas the 16th column contains the response variable, which is the price of a car. Define the variables for the neighborhood component analysis model.
Predictors = X(:,1:15); Y = X(:,16);
Fit a neighborhood component analysis (NCA) model for regression to detect the relevant features.
mdl = fsrnca(Predictors,Y);
The returned NCA model,mdl
, is aFeatureSelectionNCARegression
object. This object stores information about the training data, model, and optimization. You can access the object properties, such as the feature weights, using dot notation.
Plot the feature weights.
figure() plot(mdl.FeatureWeights,'ro') xlabel('Feature Index') ylabel('Feature Weight') gridon
The weights of the irrelevant features are zero. The'Verbose',1
option in the call tofsrnca
displays the optimization information on the command line. You can also visualize the optimization process by plotting the objective function versus the iteration number.
figure() plot(mdl.FitInfo.Iteration,mdl.FitInfo.Objective,'ro-') gridonxlabel('Iteration Number') ylabel('Objective')
TheModelParameters
property is astruct
that contains more information about the model. You can access the fields of this property using dot notation. For example, see if the data was standardized or not.
mdl.ModelParameters.Standardize
ans =logical0
0
means that the data was not standardized before fitting the NCA model. You can standardize the predictors when they are on very different scales using the'Standardize',1
name-value pair argument in the call tofsrnca
.
Copy Semantics
Value. To learn how value classes affect copy operations, seeCopying Objects.
Version History
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:.
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina(Español)
- Canada(English)
- United States(English)
Europe
- Belgium(English)
- Denmark(English)
- Deutschland(Deutsch)
- España(Español)
- Finland(English)
- France(Français)
- Ireland(English)
- Italia(Italiano)
- Luxembourg(English)
- Netherlands(English)
- Norway(English)
- Österreich(Deutsch)
- Portugal(English)
- Sweden(English)
- Switzerland
- United Kingdom(English)