HOG特征详解：Histograms of Oriented Gradients for Human Detection

waston 2020-03-24

展开全文

参考论文《Histograms of Oriented Gradients for Human Detection》

花了一天多的时间，整理了一下HOG特征。接下来就HOG特征进行一些解释：

HOG含义
HOG具体计算
HOG源码

一、HOG含义

在参考论文中，作者设计了一种方向梯度直方图（Histograms of Oriented Gradient,HOG）对行人进行检测，它通过计算局部区域的梯度方向并进行统计来作为该局部区域的特征。

二、HOG具体计算

2.1 Gamma Normalization

为了减少光照因素的影响，首先需要将整个图像进行正则化。实验证明，对每个颜色通道进行平方根gamma压缩（即gamma参数为0.5）时有较高的性能提升。gamma正则化公式如下：

其中H(x,y)表示像素点(x,y)的像素值。

2.2 Gradient Computation

计算图像横坐标和纵坐标方向的梯度，并据此每个像素位置的梯度方向。计算不同的梯度计算方法对于检测器性能有很大影响。作者在对图像进行高斯平滑后，测试了不同的梯度计算方法，包括一维模板[-1，1]、[-1，0，1]、[1，-8，0，8，-1]等，最终选择使用[-1，0，1]计算水平方向梯度，用其转置计算垂直方向梯度。

因此图像中像素点(x,y)的梯度为：

公式中Gx(x,y)表示像素点(x,y)的水平方向梯度，Gy(x,y)表示像素点(x,y)的垂直方向梯度。

通过Gx(x,y)和Gy(x,y)计算该像素点的梯度大小和方向：

公式中G(x,y)为梯度大小，θ(x,y)为梯度方向。

2.3 Spatial / Orientation Binning

统计局部图像梯度信息并进行量化，得到局部图像的特征描述向量。这能够较好的保持对图像中人体对象的姿势和外观的鲁棒性。

局部图像的单位是Cell，大小为8*8。假设采用9个bin来统计一个cell中的梯度信息，即将360度的梯度方向分成9个方向，如图所示：

量化的公式如下：

计算cell内每个像素的梯度，为某个基于方向的bin投票(vote)，从而形成方向梯度直方图。细胞单元可以是矩形的或者环形(极坐标中的扇形)的。直方图的方向bin在0度-180度(无符号梯度)或者0度-360度(有符号梯度)之间均分。为了减少混叠现象，梯度投票在相邻bin的中心之间需要进行方向和位置上的双线性插值。投票的权重根据梯度幅值进行计算，可以取幅值本身、幅值的平方或者幅值的平方根。作者通过实验表明，使用梯度本身作为投票权重效果最好。

梯度信息具体统计举例参考了这篇博客。https://blog.csdn.net/u011665459/article/details/60575107

以博尔特头顶的8*8像素大小的cell为例，通过前两步的计算可以得到每个像素的梯度幅值和梯度方向。

接下来在8×8的cell中创建一个9-bins的直方图。在蓝色圈所在像素的梯度方向是80度，幅值为2，所以在bin为80的格子里面加2。在红色圈所在像素的梯度方向为10度，幅值为4，但是bin中没有10这个值，只有0和20，所以把4平均分配到bin为0和20的格子中。

还有一点需要注意到的如果某个像素的方向超过了160，由于直方图是首尾相连的（即180就是0），因此我们把像素值按比例(根据像素的角度距离边界远近)分配到0和160的格子中，如下图。

通过对cell中所有像素点进行统计得到以下方向梯度直方图。

2.4 Block Normalization

由于局部光照的变化，以及前景背景对比度的变化，使得梯度强度的变化范围非常大。比如说，当图像值全部减少了一倍，那么梯度值也为减少一倍，但是我们不希望图像值影响到梯度值，所以需要对梯度做局部对比度归一化。

假设一个 RGB 颜色向量为 [ 128，64，32]，它的长度为146.64，这个值是用L2范数公式来计算的。接着让颜色向量同时除以长度（即归一化）得到标准化向量[0.87, 0.43, 0.22]。如果此时将颜色向量值扩大两倍即2 x [ 128, 64, 32 ] = [ 256, 128, 64 ]，我们按同样方法计算标准化向量得到的依旧是[0.87, 0.43, 0.22]。因此归一化能够使得梯度幅值不受到图像像素值变化的影响。

在论文中，作者测试了多种不同的归一化策略，大多数都是将细胞单元组成更大的空间块(block)，然后针对每个块进行对比度归一化。最终的描述子是检测窗口内所有块内的细胞单元的直方图构成的向量。事实上，块之间是有重叠的，也就是说，每个细胞单元的直方图都会被多次用于最终的描述子的计算。

以下是VLFEAT库中关于HOG特征的Block Normalization的C++实现。具体算法是将cell5在cell1245、2356、4578、5689上分别进行正则化，再将正则化后的结果做0.2的截断处理，从而最终得到一个size为4*9的HOG特征。

/*
   HOG block-normalisation.
   The Dalal-Triggs implementation computes a normalized descriptor for
   each block of 2x2 cells, by stacking the histograms of each cell
   into a vector and L2-normalizing and truncating the result.
   
   Each block-level descriptor is then decomposed back into cells
   and corresponding parts are stacked into cell-level descritpors.
   Each HOG cell is contained in exactly
   four 2x2 cell blocks. For example, the cell number 5 in the following
   figure is contained in blocks 1245, 2356, 4578, 5689:
   +---+---+---+
   | 1 | 2 | 3 |
   +---+---+---+
   | 4 | 5 | 6 |
   +---+---+---+
   | 7 | 8 | 9 |
   +---+---+---+
   Hence, when block-level descriptors are decomposed back
   into cells, each cell receives contributions from four blocks. So,
   if each cell started with a D-dimensional histogram, it
   ends up with a 4D dimesional descriptor vector.
*/
{
    float const * iter = self->hog ;
    for (y = 0 ; y < (signed)self->hogHeight ; ++y) {
      for (x = 0 ; x < (signed)self->hogWidth ; ++x) {
        /* norm of upper-left, upper-right, ... cells */
        vl_index xm = VL_MAX(x - 1, 0) ;
        vl_index xp = VL_MIN(x + 1, (signed)self->hogWidth - 1) ;
        vl_index ym = VL_MAX(y - 1, 0) ;
        vl_index yp = VL_MIN(y + 1, (signed)self->hogHeight - 1) ;
 
        double norm1 = atNorm(xm,ym) ;
        double norm2 = atNorm(x,ym) ;
        double norm3 = atNorm(xp,ym) ;
        double norm4 = atNorm(xm,y) ;
        double norm5 = atNorm(x,y) ;
        double norm6 = atNorm(xp,y) ;
        double norm7 = atNorm(xm,yp) ;
        double norm8 = atNorm(x,yp) ;
        double norm9 = atNorm(xp,yp) ;
 
        double factor1, factor2, factor3, factor4 ;
        factor1 = 1.0 / VL_MAX(sqrt(norm1 + norm2 + norm4 + norm5), 1e-10) ;
        factor2 = 1.0 / VL_MAX(sqrt(norm2 + norm3 + norm5 + norm6), 1e-10) ;
        factor3 = 1.0 / VL_MAX(sqrt(norm4 + norm5 + norm7 + norm8), 1e-10) ;
        factor4 = 1.0 / VL_MAX(sqrt(norm5 + norm6 + norm8 + norm9), 1e-10) ;
        float * oiter = features + x + self->hogWidth * y ;
 
        for (k = 0 ; k < self->numOrientations ; ++k) {
          double ha = iter[hogStride * k] ;
          double hb = iter[hogStride * (k + self->numOrientations)] ;
          double hc ;
 
          double ha1 = factor1 * ha ;
          double ha2 = factor2 * ha ;
          double ha3 = factor3 * ha ;
          double ha4 = factor4 * ha ;
 
          double hb1 = factor1 * hb ;
          double hb2 = factor2 * hb ;
          double hb3 = factor3 * hb ;
          double hb4 = factor4 * hb ;
 
          double hc1 = ha1 + hb1 ;
          double hc2 = ha2 + hb2 ;
          double hc3 = ha3 + hb3 ;
          double hc4 = ha4 + hb4 ;
          
          // jieduan
          ha1 = VL_MIN(0.2, ha1) ;
          ha2 = VL_MIN(0.2, ha2) ;
          ha3 = VL_MIN(0.2, ha3) ;
          ha4 = VL_MIN(0.2, ha4) ;
 
          hb1 = VL_MIN(0.2, hb1) ;
          hb2 = VL_MIN(0.2, hb2) ;
          hb3 = VL_MIN(0.2, hb3) ;
          hb4 = VL_MIN(0.2, hb4) ;
 
          hc1 = VL_MIN(0.2, hc1) ;
          hc2 = VL_MIN(0.2, hc2) ;
          hc3 = VL_MIN(0.2, hc3) ;
          hc4 = VL_MIN(0.2, hc4) ;
 
          *oiter = hc1 ;
          *(oiter + hogStride * self->numOrientations) = hc2 ;
          *(oiter + 2 * hogStride * self->numOrientations) = hc3 ;
          *(oiter + 3 * hogStride * self->numOrientations) = hc4 ;
          
          oiter += hogStride ;
 
        } /* next orientation */
 
        ++iter ;
      } /* next x */
    } /* next y */
  } /* block normalization */