YOLO: Real-Time Object Detection

木俊 2018-06-22

展开全文

home

darknet

coq tactics

publications

projects

résumé

YOLO: Real-Time Object Detection

https:///darknet/yolo/

You only look once (YOLO) is a state-of-the-art, real-time object detection system. On a Pascal Titan X it processes images at 30 FPS and has a mAP of 57.9% on COCO test-dev.

你只看一次（YOLO）是一个最先进的实时对象检测系统。在Pascal Titan X上，它以30 FPS的速度处理图像，COCO test-dev上的图像分辨率为57.9％。

Comparison to Other Detectors 与其他探测器比较

YOLOv3 is extremely fast and accurate. In mAP measured at .5 IOU YOLOv3 is on par with Focal Loss but about 4x faster. Moreover, you can easily tradeoff between speed and accuracy simply by changing the size of the model, no retraining required!

YOLOv3非常快速和准确。在5.0英寸测量的mAP中，YOUOv3与Focal Loss相当，但速度快了约4倍。而且，只需更改模型的大小，您就可以轻松地在速度和精度之间进行权衡，无需再培训！

Performance on the COCO DatasetModelTrainTestmAPFLOPSFPSCfgWeights

SSD300

COCO trainval

test-dev

41.2

link

SSD500

COCO trainval

test-dev

46.5

link

YOLOv2 608x608

COCO trainval

test-dev

48.1

62.94 Bn

cfg

weights

Tiny YOLO

COCO trainval

test-dev

23.7

5.41 Bn

244

cfg

weights

SSD321

COCO trainval

test-dev

45.4

link

DSSD321

COCO trainval

test-dev

46.1

link

R-FCN

COCO trainval

test-dev

51.9

link

SSD513

COCO trainval

test-dev

50.4

link

DSSD513

COCO trainval

test-dev

53.3

link

FPN FRCN

COCO trainval

test-dev

59.1

link

Retinanet-50-500

COCO trainval

test-dev

50.9

link

Retinanet-101-500

COCO trainval

test-dev

53.1

link

Retinanet-101-800

COCO trainval

test-dev

57.5

link

YOLOv3-320

COCO trainval

test-dev

51.5

38.97 Bn

cfg

weights

YOLOv3-416

COCO trainval

test-dev

55.3

65.86 Bn

cfg

weights

YOLOv3-608

COCO trainval

test-dev

57.9

140.69 Bn

cfg

weights

YOLOv3-tiny

COCO trainval

test-dev

33.1

5.56 Bn

220

cfg

weights

How It Works 怎么运行的

Prior detection systems repurpose classifiers or localizers to perform detection. They apply the model to an image at multiple locations and scales. High scoring regions of the image are considered detections.

事先检测系统将分类器或定位器重新用于执行检测。他们将模型应用于多个位置和尺度的图像。图像的高评分区域被视为检测结果。

We use a totally different approach. We apply a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.

我们使用完全不同的方法。我们将一个神经网络应用于整个图像。该网络将图像划分为区域并预测每个区域的边界框和概率。这些边界框由预测的概率加权。

Our model has several advantages over classifier-based systems. It looks at the whole image at test time so its predictions are informed by global context in the image. It also makes predictions with a single network evaluation unlike systems like R-CNN which require thousands for a single image. This makes it extremely fast, more than 1000x faster than R-CNN and 100x faster than Fast R-CNN. See our paper for more details on the full system.

我们的模型比基于分类器的系统有几个优点。它在测试时查看整个图像，以便通过图像中的全局上下文来预测其预测。与单一图像需要数千个R-CNN等系统不同的是，它也可以通过单一网络评估进行预测。这使其速度非常快，比R-CNN快1000倍，比Fast R-CNN快100倍。有关完整系统的更多详情，请参阅我们的论文。

What's New in Version 3? 版本3中有什么新功能？

YOLOv3 uses a few tricks to improve training and increase performance, including: multi-scale predictions, a better backbone classifier, and more. The full details are in our paper!

YOLOv3使用一些技巧来改善训练并提高性能，包括：多尺度预测，更好的骨干分类器等等。完整的细节在我们的论文中！

Detection Using A Pre-Trained Model 使用预训练模型进行检测

This post will guide you through detecting objects with the YOLO system using a pre-trained model. If you don't already have Darknet installed, you should do that first. Or instead of reading all that just run:

本文将引导您使用预先训练的模型通过YOLO系统检测对象。如果你还没有安装Darknet，你应该先做。或者不要阅读所有刚刚运行的内容：git clone https://github.com/pjreddie/darknet cd darknet make

Easy!

You already have the config file for YOLO in the cfg/ subdirectory. You will have to download the pre-trained weight file here (237 MB). Or just run this:

您已经在cfg /子目录中拥有YOLO的配置文件。您必须在此下载预先训练的权重文件（237 MB）。或者只是运行这个：wget https:///media/files/yolov3.weights

Then run the detector! 然后运行探测器！./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

You will see some output like this: 你会看到一些这样的输出：layer filters size input output 0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BFLOPs 1 conv 64 3 x 3 / 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BFLOPs ....... 105 conv 255 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 255 0.353 BFLOPs 106 detection truth_thresh: Using default '1.000000' Loading weights from yolov3.weights...Done! data/dog.jpg: Predicted in 0.029329 seconds. dog: 99% truck: 93% bicycle: 99%

Darknet prints out the objects it detected, its confidence, and how long it took to find them. We didn't compile Darknet with OpenCV so it can't display the detections directly. Instead, it saves them in predictions.png. You can open it to see the detected objects. Since we are using Darknet on the CPU it takes around 6-12 seconds per image. If we use the GPU version it would be much faster.

Darknet会打印出它检测到的物体，置信以及找到它们需要多长时间。我们没有使用OpenCV编译Darknet，因此无法直接显示检测结果。相反，它将它们保存在predictions.png中。您可以打开它来查看检测到的对象。由于我们在CPU上使用Darknet，每个图像需要大约6-12秒。如果我们使用GPU版本，速度会更快。

I've included some example images to try in case you need inspiration. Try data/eagle.jpg, data/dog.jpg, data/person.jpg, or data/horses.jpg!

如果您需要灵感，我已经包含了一些示例图片。试试data / eagle.jpg，data / dog.jpg，data / person.jpg或data / horses.jpg！

The detect command is shorthand for a more general version of the command. It is equivalent to the command:

detect命令是该命令的更一般版本的缩写。它相当于命令：./darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg

You don't need to know this if all you want to do is run detection on one image but it's useful to know if you want to do other things like run on a webcam (which you will see later on).

如果您只想在一张图像上运行检测，则不需要知道这一点，但知道是否要执行其他操作（如稍后会看到），这很有用。

Multiple Images 多个图像

Instead of supplying an image on the command line, you can leave it blank to try multiple images in a row. Instead you will see a prompt when the config and weights are done loading:

不要在命令行上提供图像，您可以将其留空，以便连续尝试多个图像。相反，当配置和权重完成加载时，您会看到提示：./darknet detect cfg/yolov3.cfg yolov3.weights layer filters size input output 0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BFLOPs 1 conv 64 3 x 3 / 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BFLOPs ....... 104 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs 105 conv 255 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 255 0.353 BFLOPs 106 detection Loading weights from yolov3.weights...Done! Enter Image Path:

Enter an image path like data/horses.jpg to have it predict boxes for that image.

输入像data / horses.jpg这样的图像路径，让它预测该图像的方框。

Once it is done it will prompt you for more paths to try different images. Use Ctrl-C to exit the program once you are done.

一旦完成，它会提示您输入更多路径来尝试不同的图像。完成后使用Ctrl-C退出程序。

Changing The Detection Threshold 更改检测阈值

By default, YOLO only displays objects detected with a confidence of .25 or higher. You can change this by passing the -thresh flag to the yolo command. For example, to display all detection you can set the threshold to 0:

默认情况下，YOLO仅显示检测到的具有.25或更高置信度的对象。您可以通过将-thresh标志传递给yolo命令来更改它。例如，要显示所有检测，您可以将阈值设置为0：./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg -thresh 0

Which produces: 其中产生：

![][all]

So that's obviously not super useful but you can set it to different values to control what gets thresholded by the model.

所以这显然不是非常有用，但是您可以将其设置为不同的值以控制模型设置的阈值。

Tiny YOLOv3 微小的YOLOv3

We have a very small model as well for constrained environments, yolov3-tiny. To use this model, first download the weights:

我们有一个非常小的模型以及约束环境，yolov3-tiny。要使用此模型，请先下载权重：wget https:///media/files/yolov3-tiny.weights

Then run the detector with the tiny config file and weights:

然后用小配置文件和权重运行探测器：./darknet detect cfg/yolov3-tiny.cfg yolov3-tiny.weights data/dog.jpg

Real-Time Detection on a Webcam 网络摄像头的实时检测

Running YOLO on test data isn't very interesting if you can't see the result. Instead of running it on a bunch of images let's run it on the input from a webcam!

如果你看不到结果，在测试数据上运行YOLO并不是很有趣。而不是在一堆图像上运行它，让我们在网络摄像头的输入上运行它！

To run this demo you will need to compile Darknet with CUDA and OpenCV. Then run the command:

要运行这个演示，你需要用CUDA和OpenCV编译Darknet。然后运行命令：./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights

YOLO will display the current FPS and predicted classes as well as the image with bounding boxes drawn on top of it.

YOLO将显示当前的FPS和预测类以及在其上绘制边界框的图像。

You will need a webcam connected to the computer that OpenCV can connect to or it won't work. If you have multiple webcams connected and want to select which one to use you can pass the flag -c to pick (OpenCV uses webcam 0 by default).

您需要连接到OpenCV可以连接到的计算机的网络摄像头，否则它将无法工作。如果您连接了多个网络摄像头并且想要选择使用哪一个，则可以传递标志-c进行选择（默认情况下，OpenCV使用摄像头0）。

You can also run it on a video file if OpenCV can read the video:

如果OpenCV可以读取视频，您还可以在视频文件上运行它：./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights

That's how we made the YouTube video above.

以上就是我们制作YouTube视频的方式。

Training YOLO on VOC 在VOC上训练YOLO

You can train YOLO from scratch if you want to play with different training regimes, hyper-parameters, or datasets. Here's how to get it working on the Pascal VOC dataset.

如果你想玩不同的训练体制，超参数或数据集，你可以从头开始训练YOLO。以下是如何使用Pascal VOC数据集的方法。

Get The Pascal VOC Data 获取Pascal VOC数据

To train YOLO you will need all of the VOC data from 2007 to 2012. You can find links to the data here. To get all the data, make a directory to store it all and from that directory run:

为了训练YOLO，你需要从2007年到2012年的所有VOC数据。你可以在这里找到数据的链接。要获取所有数据，请创建一个目录将其全部存储并从该目录运行：wget https:///media/files/VOCtrainval_11-May-2012.tar wget https:///media/files/VOCtrainval_06-Nov-2007.tar wget https:///media/files/VOCtest_06-Nov-2007.tar tar xf VOCtrainval_11-May-2012.tar tar xf VOCtrainval_06-Nov-2007.tar tar xf VOCtest_06-Nov-2007.tar

There will now be a VOCdevkit/ subdirectory with all the VOC training data in it.

现在将有一个VOCdevkit /子目录，其中包含所有VOC训练数据。

Generate Labels for VOC 生成VOC标签

Now we need to generate the label files that Darknet uses. Darknet wants a .txt file for each image with a line for each ground truth object in the image that looks like:

现在我们需要生成Darknet使用的标签文件。 Darknet需要为每个图像使用一个.txt文件，每个图像中的每个地面真实对象都有一行，如下所示：

Where x, y, width, and height are relative to the image's width and height. To generate these file we will run the voc_label.py script in Darknet's scripts/ directory. Let's just download it again because we are lazy.

其中x，y，宽度和高度与图像的宽度和高度有关。为了生成这些文件，我们将在Darknet的脚本/目录中运行voc_label.py脚本。让我们再次下载，因为我们很懒。wget https:///media/files/voc_label.py python voc_label.py

After a few minutes, this script will generate all of the requisite files. Mostly it generates a lot of label files in VOCdevkit/VOC2007/labels/ and VOCdevkit/VOC2012/labels/. In your directory you should see:

几分钟后，该脚本将生成所有必需的文件。大多数情况下，它会在VOCdevkit / VOC2007 / labels /和VOCdevkit / VOC2012 / labels /中生成大量标签文件。在你的目录中你应该看到：ls 2007_test.txt VOCdevkit 2007_train.txt voc_label.py 2007_val.txt VOCtest_06-Nov-2007.tar 2012_train.txt VOCtrainval_06-Nov-2007.tar 2012_val.txt VOCtrainval_11-May-2012.tar

The text files like 2007_train.txt list the image files for that year and image set. Darknet needs one text file with all of the images you want to train on. In this example, let's train with everything except the 2007 test set so that we can test our model. Run:

像2007_train.txt这样的文本文件列出了该年的图像文件和图像集。 Darknet需要一个文本文件，其中包含要训练的所有图像。在这个例子中，让我们训练除2007年测试集之外的所有东西，以便测试我们的模型。 run：cat 2007_train.txt 2007_val.txt 2012_*.txt > train.txt

Now we have all the 2007 trainval and the 2012 trainval set in one big list. That's all we have to do for data setup!

现在我们把2007年的所有训练和2012年的训练都列入了一个大名单。这就是我们需要做的数据设置！

Modify Cfg for Pascal Data 修改Pascal数据的Cfg

Now go to your Darknet directory. We have to change the cfg/voc.data config file to point to your data:

现在去你的Darknet目录。我们必须更改cfg / voc.data配置文件以指向您的数据： 1 classes= 20 2 train = /train.txt 3 valid = 2007_test.txt 4 names = data/voc.names 5 backup = backup

You should replace with the directory where you put the VOC data.

您应该将替换为放置VOC数据的目录。

Download Pretrained Convolutional Weights 下载预训练的卷积权重

For training we use convolutional weights that are pre-trained on Imagenet. We use weights from the darknet53 model. You can just download the weights for the convolutional layers here (76 MB).

对于训练，我们使用在Imagenet上预先训练的卷积权重。我们使用darknet53模型的权重。您可以在这里下载卷积图层的权重（76 MB）。wget https:///media/files/darknet53.conv.74

Train The Model 训练模型

Now we can train! Run the command: 现在我们可以训练了！运行命令：./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74

Training YOLO on COCO 在COCO上训练YOLO

You can train YOLO from scratch if you want to play with different training regimes, hyper-parameters, or datasets. Here's how to get it working on the COCO dataset.

如果你想玩不同的训练体制，超参数或数据集，你可以从头开始训练YOLO。以下是如何让它在COCO数据集上工作。

Get The COCO Data 获取COCO数据

To train YOLO you will need all of the COCO data and labels. The script scripts/get_coco_dataset.sh will do this for you. Figure out where you want to put the COCO data and download it, for example:

为了训练YOLO，你需要所有的COCO数据和标签。脚本脚本/ get_coco_dataset.sh将为您执行此操作。找出你想放置COCO数据的地方并下载它，例如：cp scripts/get_coco_dataset.sh data cd data bash get_coco_dataset.sh

Now you should have all the data and the labels generated for Darknet.

现在您应该拥有为Darknet生成的所有数据和标签。

Modify cfg for COCO 修改COCO的cfg

Now go to your Darknet directory. We have to change the cfg/coco.data config file to point to your data:

现在去你的Darknet目录。我们必须更改cfg / coco.data配置文件以指向您的数据： 1 classes= 80 2 train = /trainvalno5k.txt 3 valid = /5k.txt 4 names = data/coco.names 5 backup = backup

You should replace with the directory where you put the COCO data.

您应该将替换为放置COCO数据的目录。

You should also modify your model cfg for training instead of testing. cfg/yolo.cfgshould look like this:

您还应该修改模型cfg进行训练而不是测试。 cfg / yolo.cfg应该如下所示：[net] # Testing # batch=1 # subdivisions=1 # Training batch=64 subdivisions=8 ....

Train The Model 训练模型

Now we can train! Run the command:

现在我们可以训练了！运行命令：./darknet detector train cfg/coco.data cfg/yolov3.cfg darknet53.conv.74

If you want to use multiple gpus run: 如果你想使用多个gpus运行：./darknet detector train cfg/coco.data cfg/yolov3.cfg darknet53.conv.74 -gpus 0,1,2,3

If you want to stop and restart training from a checkpoint:

如果您想停止并从检查点重新开始训练：./darknet detector train cfg/coco.data cfg/yolov3.cfg backup/yolov3.backup -gpus 0,1,2,3

What Happened to the Old YOLO Site? 旧YOLO网站发生了什么？

If you are using YOLO version 2 you can still find the site here: https:///darknet/yolov2/

如果您使用的是YOLO版本2，您仍然可以在这里找到该网站：https：///darknet/yolov2/

Cite

If you use YOLOv3 in your work please cite our paper!@article{yolov3, title={YOLOv3: An Incremental Improvement}, author={Redmon, Joseph and Farhadi, Ali}, journal = {arXiv}, year={2018} }