Main Content

loss

Loss of linear incremental learning model on batch of data

Description

lossreturns the regression or classification loss of a configured incremental learning model for linear regression (incrementalRegressionLinearobject) or linear binary classification (incrementalClassificationLinearobject).

To measure model performance on a data stream and store the results in the output model, callupdateMetricsorupdateMetricsAndFit.

example

L= loss(Mdl,X,Y)returns the loss for the incremental learning modelMdlusing the batch of predictor dataXand corresponding responsesY.

example

L= loss(Mdl,X,Y,Name,Value)uses additional options specified by one or more name-value pair arguments. For example, you can specify that the columns of the predictor data matrix correspond to observations, or specify the classification loss function .

Examples

collapse all

The performance of an incremental model on streaming data is measured in three ways:

  1. Cumulative metrics measure the performance since the start of incremental learning.

  2. Window metrics measure the performance on a specified window of observations. The metrics are updated every time the model processes the specified window.

  3. Thelossfunction measures the performance on a specified batch of data only.

Load the human activity data set. Randomly shuffle the data.

loadhumanactivityn = numel(actid); rng(1)% For reproducibilityidx = randsample(n,n); X = feat(idx,:); Y = actid(idx);

For details on the data set, enterDescriptionat the command line.

Responses can be one of five classes: Sitting, Standing, Walking, Running, or Dancing. Dichotomize the response by identifying whether the subject is moving (actid> 2).

Y = Y > 2;

创造e an incremental linear SVM model for binary classification. Configure the model forlossby specifying the class names, prior class distribution (uniform), and arbitrary coefficient and bias values. Specify a metrics window size of 1000 observations.

p = size(X,2); Beta = randn(p,1); Bias = randn(1); Mdl = incrementalClassificationLinear('Beta',Beta,'Bias',Bias,...'ClassNames',unique(Y),'Prior','uniform','MetricsWindowSize',1000);

Mdlis anincrementalClassificationLinearmodel. All its properties are read-only. Instead of specifying arbitrary values, you can take either of these actions to configure the model:

  • Train an SVM model usingfitcsvmorfitclinearon a subset of the data (if available), and then convert the model to an incremental learner by usingincrementalLearner.

  • Incrementally fitMdlto data by usingfit.

Simulate a data stream, and perform the following actions on each incoming chunk of 50 observations:

  1. CallupdateMetricsto measure the cumulative performance and the performance within a window of observations. Overwrite the previous incremental model with a new one to track performance metrics.

  2. Calllossto measure the model performance on the incoming chunk.

  3. Callfitto fit the model to the incoming chunk. Overwrite the previous incremental model with a new one fitted to the incoming observations.

  4. Store all performance metrics to see how they evolve during incremental learning.

% PreallocationnumObsPerChunk = 50; nchunk = floor(n/numObsPerChunk); ce = array2table(zeros(nchunk,3),'VariableNames',["Cumulative""Window""Loss"]);% Incremental learningforj = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; Mdl = updateMetrics(Mdl,X(idx,:),Y(idx)); ce{j,["Cumulative""Window"]} = Mdl.Metrics{"ClassificationError",:}; ce{j,"Loss"} = loss(Mdl,X(idx,:),Y(idx)); Mdl = fit(Mdl,X(idx,:),Y(idx));end

Mdlis anincrementalClassificationLinearmodel object trained on all the data in the stream. During incremental learning and after the model is warmed up,updateMetricschecks the performance of the model on the incoming observations, then and thefitfunction fits the model to those observations.lossis agnostic of the metrics warm-up period, so it measures the classification error for all iterations.

To see how the performance metrics evolve during training, plot them.

figure plot(ce.Variables) xlim([0 nchunk]) ylim([0 0.05]) ylabel('Classification Error') xline(Mdl.MetricsWarmupPeriod/numObsPerChunk,'r-.') legend(ce.Properties.VariableNames) xlabel('Iteration')

Figure contains an axes object. The axes object contains 4 objects of type line, constantline. These objects represent Cumulative, Window, Loss.

The yellow line represents the classification error on each incoming chunk of data. After the metrics warm-up period,Mdltracks the cumulative and window metrics. The cumulative and batch losses converge as thefitfunction fits the incremental model to the incoming data.

Fit an incremental learning model for regression to streaming data, and compute the mean absolute deviation (MAD) on the incoming data batches.

Load the robot arm data set. Obtain the sample sizenand the number of predictor variablesp.

loadrobotarmn = numel(ytrain); p = size(Xtrain,2);

For details on the data set, enterDescriptionat the command line.

创造e an incremental linear model for regression. Configure the model as follows:

  • Specify a metrics warm-up period of 1000 observations.

  • Specify a metrics window size of 500 observations.

  • Track the mean absolute deviation (MAD) to measure the performance of the model. Create an anonymous function that measures the absolute error of each new observation. Create a structure array containing the nameMeanAbsoluteErrorand its corresponding function.

  • Configure the model to predict responses by specifying that all regression coefficients and the bias are 0.

maefcn = @(z,zfit,w)(abs(z - zfit)); maemetric = struct("MeanAbsoluteError", maefcn);Mdl = incrementalRegressionLinear ('MetricsWarmupPeriod',1000,'MetricsWindowSize',500,...'Metrics',maemetric,'Beta',zeros(p,1),'Bias',0,'EstimationPeriod',0)
Mdl = incrementalRegressionLinear IsWarm: 0 Metrics: [2x2 table] ResponseTransform: 'none' Beta: [32x1 double] Bias: 0 Learner: 'svm' Properties, Methods

Mdlis anincrementalRegressionLinearmodel object configured for incremental learning.

Perform incremental learning. At each iteration:

  • Simulate a data stream by processing a chunk of 50 observations.

  • CallupdateMetrics计算cumulative and window metrics on the incoming chunk of data. Overwrite the previous incremental model with a new one fitted to overwrite the previous metrics.

  • Calllossto compute the MAD on the incoming chunk of data. Whereas the cumulative and window metrics require that custom losses return the loss for each observation,lossrequires the loss on the entire chunk. Compute the mean of the absolute deviation.

  • Callfitto fit the incremental model to the incoming chunk of data.

  • Store the cumulative, window, and chunk metrics to see how they evolve during incremental learning.

% PreallocationnumObsPerChunk = 50; nchunk = floor(n/numObsPerChunk); mae = array2table(zeros(nchunk,3),'VariableNames',["Cumulative""Window""Chunk"]);% Incremental fittingforj = 1:nchunk ibegin = min(n,numObsPerChunk*(j-1) + 1); iend = min(n,numObsPerChunk*j); idx = ibegin:iend; Mdl = updateMetrics(Mdl,Xtrain(idx,:),ytrain(idx)); mae{j,1:2} = Mdl.Metrics{"MeanAbsoluteError",:}; mae{j,3} = loss(Mdl,Xtrain(idx,:),ytrain(idx),'LossFun',@(x,y,w)mean(maefcn(x,y,w))); Mdl = fit(Mdl,Xtrain(idx,:),ytrain(idx));end

Mdlis anincrementalRegressionLinearmodel object trained on all the data in the stream. During incremental learning and after the model is warmed up,updateMetricschecks the performance of the model on the incoming observations, and thefitfunction fits the model to those observations.

Plot the performance metrics to see how they evolved during incremental learning.

figure h = plot(mae.Variables); xlim([0 nchunk]) ylabel('Mean Absolute Deviation') xline(Mdl.MetricsWarmupPeriod/numObsPerChunk,'r-.') xlabel('Iteration') legend(h,mae.Properties.VariableNames)

Figure contains an axes object. The axes object contains 4 objects of type line, constantline. These objects represent Cumulative, Window, Chunk.

The plot suggests the following:

  • updateMetricscomputes the performance metrics after the metrics warm-up period only.

  • updateMetricscomputes the cumulative metrics during each iteration.

  • updateMetricscomputes the window metrics after processing 500 observations

  • BecauseMdlwas configured to predict observations from the beginning of incremental learning,losscan compute the MAD on each incoming chunk of data.

Input Arguments

collapse all

Incremental learning model, specified as anincrementalClassificationLinearorincrementalRegressionLinearmodel object. You can createMdldirectly or by converting a supported, traditionally trained machine learning model using theincrementalLearnerfunction. For more details, see the corresponding reference page.

You must configureMdlto compute its loss on a batch of observations.

  • IfMdlis a converted, traditionally trained model, you can compute its loss without any modifications.

  • Otherwise,Mdl必须满足以下条件,您可以吗specify directly or by fittingMdlto data usingfitorupdateMetricsAndFit.

    • IfMdlis anincrementalRegressionLinearmodel, its model coefficientsMdl.Betaand biasMdl.Biasmust be nonempty arrays.

    • IfMdlis anincrementalClassificationLinearmodel, its model coefficientsMdl.Betaand biasMdl.Biasmust be nonempty arrays, the class namesMdl.ClassNamesmust contain two classes, and the prior class distributionMdl.Priormust contain known values.

    • Regardless of object type, if you configure the model so that functions standardize predictor data, the predictor meansMdl.Muand standard deviationsMdl.Sigmamust be nonempty arrays.

Batch of predictor data with which to compute the loss, specified as a floating-point matrix ofnobservations andMdl.NumPredictorspredictor variables.The value of theObservationsInname-value argument determines the orientation of the variables and observations. The defaultObservationsInvalue is"rows", which indicates that observations in the predictor data are oriented along the rows ofX.

The length of the observation labelsYand the number of observations inXmust be equal;Y(j)is the label of observationj(row or column) inX.

Note

losssupports only floating-point input predictor data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Usedummyvarto convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, seeDummy Variables.

Data Types:single|double

Batch of responses (labels) with which to compute the loss, specified as a categorical, character, or string array, logical or floating-point vector, or cell array of character vectors for classification problems; or a floating-point vector for regression problems.

The length of the observation labelsYand the number of observations inXmust be equal;Y(j)is the label of observationj(row or column) inX.

For classification problems:

  • losssupports binary classification only.

  • IfYcontains a label that is not a member ofMdl.ClassNames,lossissues an error.

  • The data type ofYandMdl.ClassNamesmust be the same.

Data Types:char|string|cell|categorical|logical|single|double

Name-Value Arguments

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, whereNameis the argument name andValueis the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and encloseNamein quotes.

Example:'ObservationsIn','columns','Weights',Wspecifies that the columns of the predictor matrix correspond to observations, and the vectorWcontains observation weights to apply.

Loss function, specified as the comma-separated pair consisting of'LossFun'and a built-in loss function name or function handle.

  • Classification problems: The following table lists the available loss functions whenMdlis anincrementalClassificationLinearmodel. Specify one using its corresponding character vector or string scalar.

    Name Description
    "binodeviance" Binomial deviance
    "classiferror"(default) Misclassification rate in decimal
    "exponential" Exponential loss
    "hinge" Hinge loss
    "logit" Logistic loss
    "quadratic" Quadratic loss

    For more details, seeClassification Loss.

    Logistic regression learners return posterior probabilities as classification scores, but SVM learners do not (seepredict).

    To specify a custom loss function, use function handle notation. The function must have this form:

    lossval =lossfcn(C,S,W)

    • The output argumentlossvalis ann-by-1 floating-point vector, wherelossval(j)is the classification loss of observationj.

    • 你指定函数名(lossfcn).

    • Cis ann-by-2 logical matrix with rows indicating the class to which the corresponding observation belongs. The column order corresponds to the class order in theClassNamesproperty. CreateCby settingC(p,q)=1, if observationpis in classq, for each observation in the specified data. Set the other element in rowpto0.

    • Sis ann-by-2 numeric matrix of predicted classification scores.Sis similar to thescoreoutput ofpredict, where rows correspond to observations in the data and the column order corresponds to the class order in theClassNamesproperty.S(p,q)is the classification score of observationpbeing classified in classq.

    • Wis ann-by-1 numeric vector of observation weights.

  • Regression problems: The following table lists the available loss functions whenMdlis anincrementalRegressionLinearmodel. Specify one using its corresponding character vector or string scalar.

    Name Description Learner Supporting Metric
    "epsiloninsensitive" Epsilon insensitive loss 'svm'
    "mse"(default) 加权均方误差 'svm'and'leastsquares'

    For more details, seeRegression Loss.

    To specify a custom loss function, use function handle notation. The function must have this form:

    lossval =lossfcn(Y,YFit,W)

    • The output argumentlossvalis a floating-point scalar.

    • 你指定函数名(lossfcn).

    • Yis a lengthnnumeric vector of observed responses.

    • YFitis a lengthnnumeric vector of corresponding predicted responses.

    • Wis ann-by-1 numeric vector of observation weights.

Example:'LossFun',"mse"

Example:'LossFun',@lossfcn

Data Types:char|string|function_handle

Predictor data observation dimension, specified as the comma-separated pair consisting of'ObservationsIn'and'columns'or'rows'.

Data Types:char|string

Batch of observation weights, specified as the comma-separated pair consisting of“重量”and a floating-point vector of positive values.lossweighs the observations in the input data with the corresponding values inWeights. The size ofWeightsmust equaln, which is the number of observations in the input data.

By default,Weightsisones(n,1).

For more details, seeObservation Weights.

Data Types:double|single

Output Arguments

collapse all

Classification or regression loss, returned as a numeric scalar. The interpretation ofLdepends onWeightsandLossFun.

More About

collapse all

Classification Loss

Classification lossfunctions measure the predictive inaccuracy of classification models. When you compare the same type of loss among many models, a lower loss indicates a better predictive model.

Consider the following scenario.

  • Lis the weighted average classification loss.

  • nis the sample size.

  • yjis the observed class label. The software codes it as –1 or 1, indicating the negative or positive class (or the first or second class in theClassNamesproperty), respectively.

  • f(Xj) is the positive-class classification score for observation (row)jof the predictor dataX.

  • mj=yjf(Xj) is the classification score for classifying observationjinto the class corresponding toyj. Positive values ofmjindicate correct classification and do not contribute much to the average loss. Negative values ofmjindicate incorrect classification and contribute significantly to the average loss.

  • The weight for observationjiswj.

Given this scenario, the following table describes the supported loss functions that you can specify by using theLossFunname-value argument.

Loss Function Value ofLossFun Equation
Binomial deviance "binodeviance" L = j = 1 n w j log { 1 + exp [ 2 m j ] } .
Exponential loss "exponential" L = j = 1 n w j exp ( m j ) .
Misclassification rate in decimal "classiferror"

L = j = 1 n w j I { y ^ j y j } ,

where y ^ j is the class label corresponding to the class with the maximal score, andI{·} is the indicator function.

Hinge loss "hinge" L = j = 1 n w j max { 0 , 1 m j } .
Logit loss "logit" L = j = 1 n w j log ( 1 + exp ( m j ) ) .
Quadratic loss "quadratic" L = j = 1 n w j ( 1 m j ) 2 .

Thelossfunction does not omit an observation with aNaNscore when computing the weighted average loss. Therefore,losscan returnNaNwhen the predictor dataXcontains missing values, and the name-value argumentLossFunis not specified as"classiferror". In most cases, if the data set does not contain missing predictors, thelossfunction does not returnNaN.

This figure compares the loss functions over the scoremfor one observation. Some functions are normalized to pass through the point (0,1).

Comparison of classification losses for different loss functions

Regression Loss

Regression lossfunctions measure the predictive inaccuracy of regression models. When you compare the same type of loss among many models, a lower loss indicates a better predictive model.

Consider the following scenario.

  • Lis the weighted average classification loss.

  • nis the sample size.

  • yjis the observed response of observationj.

  • f(Xj) is the predicted value of observationjof the predictor dataX.

  • The weight for observationjiswj.

Given this scenario, the following table describes the supported loss functions that you can specify by using theLossFunname-value argument.

Loss Function Value ofLossFun Equation
Epsilon insensitive loss "epsiloninsensitive" L = max [ 0 , | y f ( x ) | ε ] .
Mean squared error "mse" L = [ y f ( x ) ] 2 .

Thelossfunction does not omit an observation with aNaN预测时计算加权平均洛杉矶s. Therefore,losscan returnNaNwhen the predictor dataXcontains missing values. In most cases, if the data set does not contain missing predictors, thelossfunction does not returnNaN.

Algorithms

collapse all

Observation Weights

For classification problems, if the prior class probability distribution is known (in other words, the prior distribution is not empirical),lossnormalizes observation weights to sum to the prior class probabilities in the respective classes. This action implies that observation weights are the respective prior class probabilities by default.

For regression problems or if the prior class probability distribution is empirical, the software normalizes the specified observation weights to sum to 1 each time you callloss.

Extended Capabilities

Version History

Introduced in R2020b

expand all