Tensorflow的softmax

Rainbow_Heaven 2017-08-18

展开全文

第一次接触这个函数的时候，直接给整蒙了，好端端的softmax层不放在inference里，怎么给单独抽出来了？下面就根据tensorflow的官方API，聊一聊这个又长又丑的函数。

然后，我干的第一件事情，就是把官方API的文档给copy过来了，方便后面引用

Computes softmax cross entropybetween logits and labels.

Measures the probability error indiscrete classification tasks in which the classes are mutually exclusive (eachentry is in exactly one class). For example, each CIFAR-10 image is labeledwith one and only one label: an image can be a dog or a truck, but not both.

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that eachrow of labels is a valid probability distribution. If they are not, the computationof the gradient will be incorrect.

If using exclusive labels (whereinone and only one class is true at a time), see sparse_softmax_cross_entropy_with_logits.

WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.

Logits and labels must have the sameshape [batch_size, num_classes] and the same dtype (eitherfloat32 or float64).

Args:

logits: Unscaled log probabilities.

labels: Each row labels[i] must be avalid probability distribution.

name: A name for the operation(optional).

Returns:

A 1-D Tensor of length batch_size of the same type as logits with the softmax crossentropy loss.

然后就是我个人的理解。

首先看输入logits，它的shape是[batch_size, num_classes] ，一般来讲，就是神经网络最后一层的输入z。

另外一个输入是labels，它的shape也是[batch_size, num_classes]，就是我们神经网络期望的输出。

这个函数的作用就是计算最后一层是softmax层的cross entropy，只不过tensorflow把softmax计算与cross entropy计算放到一起了，用一个函数来实现，用来提高程序的运行速度，原话就是

it performs a softmax on logits internally for efficiency。

开始看到这个函数的时候，第一反应就是，softmax被拉出来单独计算了，那么原网络inference岂不是不完整了？因为最后一层的softmax输出计算没有在inference里进行啊。后来想想，这个貌似对最后的结果正确性和accuracy计算没什么影响，因为最后一层的计算y = softmax（z）不会影响到输出值的大小顺序，主因是softmax是个单调增函数，也就是说，z的大小排序和y的大小排序是一样的，废话不多说，为了更好理解，我上图！