A softmax layer applies a softmax function to the input.
For classification problems, a softmax layer and then a classification layer usually follow the final fully connected layer.
The output unit activation function is the softmax function:
where
and
.
The softmax function is the output unit activation function after the last fully connected layer for multi-class classification problems:
where
and
. Moreover,
,
is the conditional probability of the sample given classr, and
is the class prior probability.
The softmax function is also known as thenormalized exponentialand can be considered the multi-class generalization of the logistic sigmoid function[1].