将Faster RCNN的python demo改成C++ demo

mscdj 2016-10-13

展开全文

突然想尝试下把faster rcnn的demo.py改成C++版本的，本来想想应该不会太难吧，什么层都不用写，只用写一下图片读入到网络，然后差不多就ok了。结果就是因为这个“不难”，又调了一晚上bug。
最先想到的尝试就是直接在C++上运行一个简单的生成net的代码，然后再根据这个代码改成有输入有输出的就可以了，可是，居然连最简单的代码都有报错。代码如下：

一个带有python layer的demo测试

#include <stdio.h>  // for snprintf
#include <string>
#include <vector>
#include <math.h>
#include <fstream>
#include "caffe/caffe.hpp"
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
using namespace caffe;
using namespace std;

class Detector {
public:
    Detector(const string& model_file, const string& trained_file);
private:
    shared_ptr<Net<float> > net_;
    Detector(){}
    float threshold;
};
Detector::Detector(const string& model_file, const string& trained_file)
{
    net_ = shared_ptr<Net<float> >(new Net<float>(model_file, caffe::TEST));
    net_->CopyTrainedLayersFrom(trained_file);
}
int main()
{
    string model_file = "/home/xyy/Desktop/doing/objectDetection/py-faster-rcnn/models/ZF/faster_rcnn_alt_opt/faster_rcnn_test.pt";
    string trained_file = "/home/xyy/Desktop/doing/objectDetection/py-faster-rcnn/data/faster_rcnn_models/ZF_faster_rcnn_final.caffemodel";
    Caffe::SetDevice(0);
    Caffe::set_mode(Caffe::GPU);
    Detector det = Detector(model_file, trained_file);
    return 0;
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

我还专门长了个心眼，把lib的路径直接写道PYTHONPATH环境变量中，
结果运行的时候居然报这个奇葩的错误：
这里写图片描述
无法知道到底是哪里出错。Python调caffe就可以用python layer，难道C++调caffe就不能用python layer了？然后就为了解决这个疑惑一直不停的尝试。唉，最终并没有找到这个问题的原因，毕竟对C++ python混编晕晕呼呼的，caffe这一块也并没有写的很清晰，不过最终我没有从这个问题出发，而是不断的想python调用和C++调用到底有什么不同，最终将caffe-fast-rcnn的python路径也加入到PYTHONPATH中，解决了这个问题。现在想想，应该就是另一个caffe没有把with_python_layer设置成1，所以caffe没办法调用python layer的原因吧。
caffe的确有很多代码不清晰啊，等到有时间好好读读mxnet，学习学习。
这么嚣张的题目，上面的渣渣内容也混了好一段时间了，这次该好好把主要内容详细说说了：注意，我对makefile并不是特别熟悉，故这次重点在改代码，而不是编译上，我用了最low的编译方式，大家不要嘲笑我

faster rcnn demo的C++版

本人主要是跟着python代码流程走，写了一个一个C++版。首先定义了一个detector类：

class Detector {
public:
    Detector(const string& model_file, const string& trained_file);
    void Detection(const string& im_name, float img_scale);
    void bbox_transform_inv(const int num, const float* box_deltas, const float* pred_cls, float* boxes, float* pred, int img_height, int img_width);
    void vis_detections(cv::Mat image, int* keep, int num_out, float* sorted_pred_cls, float CONF_THRESH);
    void boxes_sort(int num, const float* pred, float* sorted_pred);
private:
    shared_ptr<Net<float> > net_;
    Detector(){}
};1
2
3
4
5
6
7
8
9
10
11
1
2
3
4
5
6
7
8
9
10
11

其中主要是实现Detection功能，得到bbox，回归到图片的bbox_transform_inv功能，可视化功能，以及候选框得分排序功能四个函数。

void Detector::boxes_sort(const int num, const float* pred, float* sorted_pred)1
1

首先说一下这个排序函数，传入参数分别的框的个数，未排序框的指针，和已排序框的指针。两指针均在函数外定义空间。

void Detector::boxes_sort(const int num, const float* pred, float* sorted_pred)
{
    vector<myInfo> my;
    myInfo tmp;
    for (int i = 0; i< num; i++)
    {
        tmp.score = pred[i*5 + 4];
        tmp.head = pred + i*5;
        my.push_back(tmp);
    }
    std::sort(my.begin(), my.end(), compare);
    for (int i=0; i<num; i++)
    {
        for (int j=0; j<5; j++)
            sorted_pred[i*5+j] = my[i].head[j];
    }
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

本函数主要是根据std自带的插入排序算法进行排序：先自定义一个结构体以及对应的排序函数：

struct myInfo
{
    float score;
    const float* head;
};
bool compare(const myInfo& myInfo1, const myInfo& myInfo2)
{
    return myInfo1.score > myInfo2.score;
}1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9

然后通过结构体的形式插入到vector中，通过std自带排序算法排完序之后再传出来。
可视化也比较简单，就是将大于阈值的框直接通过OpenCV的rectangle画出来即可，代码如下：

void Detector::vis_detections(cv::Mat image, int* keep, int num_out, float* sorted_pred_cls, float CONF_THRESH)
{
    int i=0;
    while(sorted_pred_cls[keep[i]*5+4]>CONF_THRESH && i < num_out)
    {
        if(i>=num_out)
            return;
        cv::rectangle(image,cv::Point(sorted_pred_cls[keep[i]*5+0], sorted_pred_cls[keep[i]*5+1]),cv::Point(sorted_pred_cls[keep[i]*5+2], sorted_pred_cls[keep[i]*5+3]),cv::Scalar(255,0,0));
        i++;  
    }
}1
2
3
4
5
6
7
8
9
10
11
1
2
3
4
5
6
7
8
9
10
11

接下来说bbox回归到图片，看过rcnn系列的人们都应该知道，rbg大神在做回归的时候并不是直接欧式距离回归，而是相当于归一化的操作以后，在区间[0,1]处进行回归，这样loss就不会很大，可控性强。具体根据网络输出的δ回归到图片的位置公式如下：
这里写图片描述
代码就是根据这个公式写的：

void Detector::bbox_transform_inv(int num, const float* box_deltas, const float* pred_cls, float* boxes, float* pred, int img_height, int img_width)
{
    float width, height, ctr_x, ctr_y, dx, dy, dw, dh, pred_ctr_x, pred_ctr_y, pred_w, pred_h;
    for(int i=0; i< num; i++)
    {
        width = boxes[i*4+2] - boxes[i*4+0] + 1.0;
        height = boxes[i*4+3] - boxes[i*4+1] + 1.0;
        ctr_x = boxes[i*4+0] + 0.5 * width;
        ctr_y = boxes[i*4+1] + 0.5 * height;
        for (int j=0; j< 21; j++)
        {

            dx = box_deltas[(i*21+j)*4+0];
            dy = box_deltas[(i*21+j)*4+1];
            dw = box_deltas[(i*21+j)*4+2];
            dh = box_deltas[(i*21+j)*4+3];
            pred_ctr_x = ctr_x + width*dx;
            pred_ctr_y = ctr_y + height*dy;
            pred_w = width * exp(dw);
            pred_h = height * exp(dh);
            pred[(j*num+i)*5+0] = max(min(pred_ctr_x - 0.5* pred_w, img_width -1), 0);
            pred[(j*num+i)*5+1] = max(min(pred_ctr_y - 0.5* pred_h, img_height -1), 0);
            pred[(j*num+i)*5+2] = max(min(pred_ctr_x + 0.5* pred_w, img_width -1), 0);
            pred[(j*num+i)*5+3] = max(min(pred_ctr_y + 0.5* pred_h, img_height -1), 0);
            pred[(j*num+i)*5+4] = pred_cls[i*21+j];
        }
    }
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

num是框的个数， box_deltas是回归层的输出, pred_cls是最后一层Softmax的输出, boxes是roi层输出的候选框的坐标（要除以放大的scale）, pred最终预测的框的坐标, height, width 图片的宽和高。
最后介绍Detection函数：
这个函数的主要任务就是
1. 载入图片和其他信息
2. 把regression输出的δ信息转化成原图的预测坐标
3. 对所有候选框进行排序
4. 使用nms算法将框归并
5. 显示
我将会逐一介绍每个地方，首先是一大堆定义：

float CONF_THRESH = 0.7;
    float NMS_THRESH = 0.2;
    cv::Mat cv_img = cv::imread(im_name);
    cv::Mat cv_new(cv_img.rows, cv_img.cols, CV_32FC3, cv::Scalar(0,0,0));
    if(cv_img.empty())
    {
        return ;
    }
    int height = int(cv_img.rows * img_scale);
    int width = int(cv_img.cols * img_scale);
    int num_out;
    cv::Mat cv_resized;

    float im_info[3];
    float data_buf[height*width*3];
    float *boxes = NULL;
    float *pred = NULL;
    float *pred_per_class = NULL;
    float *sorted_pred_cls = NULL;
    int *keep = NULL;
    const float* bbox_delt;
    const float* rois;
    const float* pred_cls;
    int num;1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

其次是载入图片信息以及减去mean

for (int h = 0; h < cv_img.rows; ++h )
    {
        for (int w = 0; w < cv_img.cols; ++w)
        {   
            cv_new.at<cv::Vec3f>(cv::Point(w, h))[0] = float(cv_img.at<cv::Vec3b>(cv::Point(w, h))[0])-float(102.9801);
            cv_new.at<cv::Vec3f>(cv::Point(w, h))[1] = float(cv_img.at<cv::Vec3b>(cv::Point(w, h))[1])-float(115.9465);
            cv_new.at<cv::Vec3f>(cv::Point(w, h))[2] = float(cv_img.at<cv::Vec3b>(cv::Point(w, h))[2])-float(122.7717);

        }
    }   
    cv::resize(cv_new, cv_resized, cv::Size(width, height));
    im_info[0] = cv_resized.rows;
    im_info[1] = cv_resized.cols;
    im_info[2] = img_scale;
    for (int h = 0; h < height; ++h )
    {
        for (int w = 0; w < width; ++w)
        {           
            data_buf[(0*height+h)*width+w] = float(cv_resized.at<cv::Vec3f>(cv::Point(w, h))[0]);
            data_buf[(1*height+h)*width+w] = float(cv_resized.at<cv::Vec3f>(cv::Point(w, h))[1]);
            data_buf[(2*height+h)*width+w] = float(cv_resized.at<cv::Vec3f>(cv::Point(w, h))[2]);
        }
    }

    net_->blob_by_name("data")->Reshape(1, 3, height, width);
    net_->blob_by_name("data")->set_cpu_data(data_buf);
    net_->blob_by_name("im_info")->set_cpu_data(im_info);1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

我没有采用caffe源码中自带的c++例子，而是使用了更方便理解的方法，首先将图片用opencv读入，然后对应位置减去mean并resize成256，256，然后通过遍历每个像素点的相关值并赋值给一个data_buf内存空间，最后通过blob的set_cpu_data函数的方式将整张图片读入到caffe的网络中。这就是我的caffe图片读入方式。接着读入其他信息。
前向计算之后进行δ到原图的逆向计算。

bbox_transform_inv(num, bbox_delt, pred_cls, boxes, pred, cv_img.rows, cv_img.cols);1
1

然后根据每个类别，进行相对应的排序，非最大抑制和图片显示：

bbox_transform_inv(num, bbox_delt, pred_cls, boxes, pred, cv_img.rows, cv_img.cols);
    for (int i = 1; i < 21; i ++)
    {
        for (int j = 0; j< num; j++)
        {
            for (int k=0; k<5; k++)
                pred_per_class[j*5+k] = pred[(i*num+j)*5+k];
        }
        boxes_sort(num, pred_per_class, sorted_pred_cls);
        _nms(keep, &num_out, sorted_pred_cls, num, 5, NMS_THRESH, 0);
        vis_detections(cv_img, keep, num_out, sorted_pred_cls, CONF_THRESH);
    }1
2
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
7
8
9
10
11
12

以上就是整个faster rcnn的demo代码，接下来要小说一下奇葩的编译：
首先，你想用py_faster_rcnn 就需要在子目录的caffe中指定with_python_layer为1，然后进行编译，并且可以的话把路径写入PYTHONPATH中，其次由于本人makefile薄弱，不会修改caffe中的makefile，故想出了一个好方法，就是把这个c++的demo写到py_faster_rcnn的caffe的tools中，这样就避免了不会编译的尴尬。还有一点就是我用到了rbg大神写的_nms，故图简单我把.h文件直接复制到tools中了，很不美观吧？？大家如果能做的更好，请在下面评论一下，让我也有改进的机会哈，谢拉。。。
另外，本人ubuntu小白，Git小白，这些天还有点事情，所以说上传到git上可能需要过些时候，谁急着需要C++demo的写下邮箱的地址，我直接给发过去。