Main Content

预测使用说criminant Analysis Models

预测uses three quantities to classify observations:posterior probability,prior probability, and成本

预测classifies so as to minimize the expected classification cost:

y ^ = arg min y = 1 , 。.. , K k = 1 K P ^ ( k | x ) C ( y | k ) ,

where

  • y ^ 是个预测ed classification.

  • K是个number of classes.

  • P ^ ( k | x ) 是个posterior probability of classkfor observationx

  • C ( y | k ) 是个成本的classifying an observation asywhen its true class isk

The space ofXvalues divides into regions where a classificationYis a particular value. The regions are separated by straight lines for linear discriminant analysis, and by conic sections (ellipses, hyperbolas, or parabolas) for quadratic discriminant analysis. For a visualization of these regions, seeCreate and Visualize Discriminant Analysis Classifier

Posterior Probability

The posterior probability that a pointxbelongs to classk是个product of theprior probability和the multivariate normal density. The density function of the multivariate normal with 1-by-d意思是μ.kd-by-dcovariance Σkat a 1-by-d观点xis

P ( x | k ) = 1 ( ( 2 π. ) d | Σ k | ) 1 / 2 exp ( 1 2 ( x μ. k ) Σ k 1 ( x μ. k ) T ) ,

where | Σ k | 是个determinant of Σk, and Σ k 1 是个inverse matrix.

P(k) represent the prior probability of classk。Then the posterior probability that an observationx是课堂kis

P ^ ( k | x ) = P ( x | k ) P ( k ) P ( x ) ,

whereP(x) is a normalization constant, namely, the sum overkP(x|k)P(k).

Prior Probability

现有概率是三种选择之一:

  • 'uniform'— The prior probability of classkis 1 over the total number of classes.

  • 'empirical'— The prior probability of classk是个number of training samples of classk除以培训样本总数。

  • A numeric vector — The prior probability of classk是个jth element of thePriorvector. Seefitcdiscr.

After creating a classifierobj, you can set the prior using dot notation:

obj.Prior = v;

wherev是表示每个元素发生的频率的正元素的矢量。在设置新的先前时,您无需重新定制分类器。

成本

There are two costs associated with discriminant analysis classification: the true misclassification cost per class, and the expected misclassification cost per observation.

True Misclassification Cost per Class

成本(i,j)是个成本的classifying an observation into classjif its true class isi。By default,成本(i,j)= 1if我〜= j, and成本(i,j)= 0if我= J.。换句话说,是成本0正确分类,和1for incorrect classification.

您可以在创建分类器时设置您喜欢的任何成本矩阵。通过成本矩阵成本name-value pair infitcdiscr.

After you create a classifierobj, you can set a custom cost using dot notation:

obj.Cost = B;

Bis a square matrix of sizeK-by-Kwhen there areKclasses. You do not need to retrain the classifier when you set a new cost.

Expected Misclassification Cost per Observation

Suppose you have谈判observations that you want to classify with a trained discriminant analysis classifierobj。Suppose you haveKclasses. You place the observations into a matrixXnewwith one observation per row. The command

[label,score,cost] = predict(obj,Xnew)

returns, among other outputs, a cost matrix of size谈判-by-K。Each row of the cost matrix contains the expected (average) cost of classifying the observation into each of theKclasses.成本(n,k)is

i = 1 K P ^ ( i | X ( n ) ) C ( k | i ) ,

where

  • K是个number of classes.

  • P ^ ( i | X ( n ) ) 是个posterior probability的classifor observationXnew(n).

  • C ( k | i ) 是个成本的classifying an observation askwhen its true class isi

See Also

Functions

Objects

相关话题