【原】CV之FRec之LF：人脸识别中常用的几种损失函数(Triplet Loss、Center Loss)简介、使用方法之详细攻略

处女座的程序猿 2021-09-28

展开全文

CV之FRec之LF：人脸识别中常用的几种损失函数(Triplet Loss、Center Loss)简介、使用方法之详细攻略

T1、Triplet Loss

《FaceNet: A Unified Embedding for Face Recognition and Clustering》
https:///pdf/1503.03832.pdf
http://www./surveillance/2015/1503.03832v1.pdf

1、英文原文解释

Triplet Loss The embedding is represented by f(x) ∈ R d . It embeds an image x into a d-dimensional Euclidean space. Additionally, we constrain this embedding to live on the d-dimensional hypersphere, i.e. kf(x)k2 = 1. This loss is motivated in [19] in the context of nearest-neighbor classifi- cation. Here we want to ensure that an image x a i (anchor) of a specific person is closer to all other images x p i (positive) of the same person than it is to any image x n i (negative) of any other person. This is visualized in Figure 3. Thus we want, kx a i − x p i k 2 2 + α < kx a i − x n i k 2 2 , ∀ (x a i , x p i , xn i ) ∈ T , (1)

三联体损耗嵌入由f(x)∈R d表示。它将图像x嵌入到d维欧几里得空间中。另外，我们将这个嵌入限制在d维超球面上，即kf(x)k2 = 1。这种损失是在[19]中最近邻分类的背景下产生的。在这里，我们要确保一个特定的人的图像x a i(锚点)是更接近所有其他图像x p i(积极的)是同一个人比它是任何图像x n i(消极的)任何其他的人。如图3所示。因此我们希望,kx 2我−x p k 2 +α< kx 2−x n我k 2,∀(x, x p i, xn i)∈T (1)

where α is a margin that is enforced between positive and negative pairs. T is the set of all possible triplets in the training set and has cardinality N. The loss that is being minimized is then L = X N i h kf(x a i ) − f(x p i )k 2 2 − kf(x a i ) − f(x n i )k 2 2 + α i + . (2) Generating all possible triplets would result in many triplets that are easily satisfied (i.e. fulfill the constraint in Eq. (1)). These triplets would not contribute to the training and result in slower convergence, as they would still be passed through the network. It is crucial to select hard triplets, that are active and can therefore contribute to improving the model. The following section talks about the different approaches we use for the triplet selection.

其中α是一个利润率之间执行积极的和消极的对。T是在训练集的集合所有可能的三胞胎,基数N的损失最小化是我L = X N h kf (X我)−f (X p i) k 2 2−kf (X我)−f (X N i) k 2 + 2 +α。(2)生成所有可能的三胞胎会产生许多容易满足的三胞胎(即满足式(1)中的约束条件)。这些三胞胎将不会有助于训练，并导致较慢的收敛，因为他们仍然会通过网络。关键是要选择硬三胞胎，这是积极的，因此可以有助于改善模型。下面的部分将讨论我们在三重选择中使用的不同方法。

2、代码实现

triplet_loss
(anchor, positive, negative, alpha): #(随机选取的人脸样本的特征,anchor的正、负样本的特征)
    #它们的形状都是(batch_size,feature_size)，feature_size是网络学习的人脸特征的维数
    """Calculate the triplet loss according to the FaceNet paper

    with tf.variable_scope('triplet_loss'):
        pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), 1)#pos_dist就是anchor到各自正样本之间的距离
        neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), 1)#neg_dist是anchor到负样本的距离
        
        basic_loss = tf.add(tf.subtract(pos_dist,neg_dist), alpha)#用pos_dist减去neg_dist再加上一个alpha，最终损失只计算大于0的部分
        loss = tf.reduce_mean(tf.maximum(basic_loss, 0.0), 0)

T2、Center loss

《A Discriminative Feature Learning Approach for Deep Face Recognition》

http://ydwen./papers/WenECCV16.pdf

1、英文原文解释

The Center Loss So, how to develop an effective loss function to improve the discriminative power of the deeply learned features? Intuitively, minimizing the intra-class variations while keeping the features of different classes separable is the key. To this end, we propose the center loss function, as formulated in Eq. 2. LC = 1 2 m i=1 xi − cyi 2 2 (2)

那么，如何建立一个有效的损失函数来提高学习特征的辨别力呢?直观地说，在保持不同类的特性可分离的同时最小化类内的变化是关键。为此，我们提出了如式2所示的中心损失函数。LC =1 2 m i=1 xi−cyi 2 2 (2)

The cyi ∈ Rd denotes the yith class center of deep features. The formulation effectively characterizes the intra-class variations. Ideally, the cyi should be updated as the deep features changed. In other words, we need to take the entire training set into account and average the features of every class in each iteration, which is inefficient even impractical. Therefore, the center loss can not be used directly. This is possibly the reason that such a center loss has never been used in CNNs until now. To address this problem, we make two necessary modifications. First, instead of updating the centers with respect to the entire training set, we perform the update based on mini-batch. In each iteration, the centers are computed by averaging the features of the corresponding classes (In this case, some of the centers may not update). Second, to avoid large perturbations caused by few mislabelled samples, we use a scalar α to control the learning rate of the centers.

cyi∈Rd表示深度特征的yith类中心。这个公式有效地描述了阶级内部的变化。理想情况下，cyi应该随着深度特性的变化而更新。换句话说，我们需要考虑整个训练集，并在每次迭代中平均每个类的特性，这是低效甚至不切实际的。因此，中心损失不能直接使用。这可能就是为什么在CNNs中从未使用过这种中心丢失的原因。为了解决这个问题，我们做了两个必要的修改。首先，我们不是根据整个训练集更新中心，而是基于mini-batch执行更新。在每次迭代中，通过平均对应类的特性来计算中心(在这种情况下，一些中心可能不会更新)。第二,避免大扰动引起的几贴样品,我们用一个标量α控制中心的学习速率。

2、代码实现

center_loss
     features, label, alfa, nrof_classes
    #features是样本的特征，形状为(batch size,feature size) 
    nrof_features = features.get_shape()[1] #nrof_features就是feature_size ，即神经网络计算人脸的维数
    #centers为变量，它是各个类别对应的类别中心
    centers = tf.get_variable('centers', [nrof_classes, nrof_features], dtype=tf.float32,
        initializer=tf.constant_initializer(0), trainable=False)
    label = tf.reshape(label, [-1])
    centers_batch = tf.gather(centers, label) #根据label，取出features中每一个样本对应的类别中心
    #centers_batch应该和features的形状一致，为(batch size,feature size)
    diff = (1 - alfa) * (centers_batch - features) #计算类别中心和各个样本特征的差距diff，diff用来更新各个类别中心的位置，计算diff时用到的alfa是一个超参数，它可以控制中心位置的更新幅度
    centers = tf.scatter_sub(centers, label, diff) #diff来重新中心
    loss = tf.reduce_mean(tf.square(features - centers_batch)) #计算loss
    return loss, centers #返回loss和更新后的中心