HALCON 20.11：深度学习笔记(7)

乘舟泛海赏雨 2021-03-27

展开全文

HALCON 20.11.0.0中，实现了深度学习方法。下面，我们将描述深度学习环境中使用的最重要的术语：

anchor (锚)

Anchors are fixed bounding boxes. They serve as reference boxes (参考框), with the aid of which the network proposes bounding boxes for the objects to be localized (定位).

annotation (注释)

An annotation is the ground truth information, what a given instance in the data represents, in a way recognizable for the network. This is e.g., the bounding box and the corresponding label for an instance in object detection.

anomaly (异常)

An anomaly means something deviating from (偏离) the norm, something unknown.

backbone (骨干)

A backbone is a part of a pretrained classification network. Its task is to generate various feature maps (特征图), for what reason the classifying layer has been removed.

batch size (批大小) - hyperparameter 'batch_size'

The dataset is divided into smaller subsets of data, which are called batches. The batch size determines the number of images taken into a batch and thus processed simultaneously (同时).

bounding box (边界框)

Bounding boxes are rectangular boxes used to define a part within an image and to specify the localization of an object within an image.

class agnostic (类不可知论者)

Class agnostic means without the knowledge of the different classes. In HALCON, we use it for reduction of overlapping predicted bounding boxes. This means, for a class agnostic bounding box suppression the suppression of overlapping instances is done ignoring the knowledge of classes, thus strongly overlapping instances get suppressed independently of their class.

change strategy (改变策略)

A change strategy denotes the strategy, when and how hyperparameters are changed during the training of a DL model.

class (类)

Classes are discrete categories (离散类别) (e.g., 'apple', 'peach', 'pear') that the network distinguishes. In HALCON, the class of an instance is given by its appropriate annotation.

classifier (分类器)

In the context (上下文) of deep learning we refer to the term classifier as follows. The classifier takes an image as input and returns the inferred confidence values (推断置信值), expressing how likely the image belongs to every distinguished class. E.g., the three classes 'apple', 'peach', and 'pear' are distinguished. Now we give an image of an apple to the classifier. As a result, the confidences 'apple': 0.92, 'peach': 0.07, and 'pear': 0.01 could be returned.

COCO (上下文常见对象)

COCO is an abbreviation (缩写) for "common objects in context", a large-scale object detection, segmentation, and captioning dataset. There is a common file format for each of the different annotation (注释) types.

confidence (置信度)

Confidence is a number expressing (表示) the affinity (亲缘关系) of an instance to a class. In HALCON the confidence is the probability, given in the range of [0,1]. Alternative name: score

confusion matrix (混淆矩阵)

A confusion matrix is a table which compares the classes predicted by the network (top-1) with the ground truth class affiliations (从属关系). It is often used to visualize the performance of the network on a validation or test set.

Convolutional Neural Networks (CNNs) (卷积神经网络)

Convolutional Neural Networks are neural networks used in deep learning, characterized by the presence of at least one convolutional layer (卷积层) in the network. They are particularly successful for image classification.

data (数据)

We use the term data in the context of deep learning for instances to be recognized (e.g., images) and their appropriate information concerning the predictable characteristics (可预测特征) (e.g., the labels in case of classification).

data augmentation (数据扩充)

Data augmentation is the generation of altered copies of samples within a dataset. This is done in order to augment the richness of the dataset, e.g., through flipping or rotating.

dataset (数据集): training (训练集), validation (验证集), and test set (测试集)

With dataset we refer to the complete set of data used for a training. The dataset is split into three, if possible disjoint, subsets:

The training set contains the data on which the algorithm optimizes the network directly.
The validation set contains the data to evaluate the network performance during training.
The test set is used to test possible inferences (predictions), thus to test the performance on data without any influence on the network optimization.

deep learning (深度学习)

The term "deep learning" was originally used to describe the training of neural networks with multiple hidden layers. Today it is rather used as a generic term for several different concepts in machine learning. In HALCON, we use the term deep learning for methods using a neural network with multiple hidden layers.

epoch (世)

In the context of deep learning, an epoch is a single training iteration over the entire training data, i.e., over all batches. Iterations over epochs should not be confused with the iterations over single batches (e.g., within an epoch).

在深度学习环境中，epoch是对整个训练数据的单一训练迭代，即对所有批次的训练迭代。在epoch上的迭代不应该与在单个批次(例如，在epoch内)上的迭代相混淆。

errors (错误)

In the context of deep learning, we refer to error when the inferred class of an instance does not match the real class (e.g., the ground truth label in case of classification). Within HALCON, we use the term error in deep learning when we refer to the top-1 error.

feature map (特征图)

A feature map is the output of a given layer.

feature pyramid (特征金字塔)

A feature pyramid is simply a group of feature maps, whereby every feature map origins from another level, i.e., it is smaller than its preceding levels.

head (头)

Heads are subnetworks. For certain architectures they attach on selected pyramid levels. These subnetworks proceed information from previous parts of the total network in order to generate spatially resolved output, e.g., for the class predictions. Thereof they generate the output of the total network and therewith constitute the input of the losses.

hyperparameter (超参数)

Like every machine learning model, CNNs contain many formulas with many parameters. During training the model learns from the data in the sense of optimizing the parameters. However, such models can have other, additional parameters, which are not directly learned during the regular training. These parameters have values set before starting the training. We refer to this last type of parameters as hyperparameters in order to distinguish them from the network parameters that are optimized during training. Or from another point of view, hyperparameters are solver-specific parameters. Prominent examples are the initial learning rate or the batch size.

inference phase (推理阶段)

The inference phase is the stage when a trained network is applied to predict (infer) instances (which can be the total input image or just a part of it) and eventually their localization. Unlike during the training phase, the network is not changed anymore in the inference phase.

intersection over union (交集)

The intersection over union (IoU) is a measure to quantify (程度) the overlap of two areas. We can determine the parts common in both areas, the intersection, as well as the united areas, the union. The IoU is the ratio between the two areas intersection and union. The application of this concept may differ between the methods.

label (标签)

Labels are arbitrary strings used to define the class of an image. In HALCON these labels are given by the image name (eventually followed by a combination of underscore and digits) or by the directory name, e.g., 'apple_01.png', 'pear.png', 'peach/01.png'.

layer and hidden layer (层和隐藏层)

A layer is a building block in a neural network, thus performing specific tasks (e.g., convolution (卷积), pooling (池化), etc., for further details we refer to the “Solution Guide on Classification”). It can be seen as a container, which receives weighted input, transforms it, and returns the output to the next layer. Input and output layers are connected to the dataset, i.e., the images or the labels, respectively. All layers in between are called hidden layers.

learning rate (学习率) - hyperparameter 'learning_rate'

The learning rate is the weighting (权重), with which the gradient (see the entry for the stochastic gradient descent SGD) is considered when updating the arguments of the loss function. In simple words, when we want to optimize a function, the gradient tells us the direction in which we shall optimize and the learning rate determines how far along this direction we step. Alternative names: step size

level (层次)

The term level is used to denote within a feature pyramid network the whole group of layers, whose feature maps have the same width and height. Thereby the input image represents level 0.

loss (损失)

A loss function compares the prediction from the network with the given information, what it should find in the image (and, if applicable, also where), and penalizes deviations (惩罚偏差). This loss function is the function we optimize during the training process to adapt the network to a specific task. Alternative names: objective (目标) function, cost (成本) function, utility (效用) function

momentum (动量) - hyperparameter 'momentum'

The momentum is used for the optimization of the loss function arguments. When the loss function arguments are updated (after having calculated the gradient), a fraction 𝜇 of the previous update vector (of the past iteration step) is added. This has the effect of damping oscillations (阻尼振荡). We refer to the hyperparameter 𝜇 as momentum. When 𝜇 is set to 0, the momentum method has no influence. In simple words, when we update the loss function arguments, we still remember the step we did for the last update. Now we go a step in direction of the gradient with a length according to the learning rate and additionally we repeat the step we did last time, but this time only 𝜇 times as long.

non-maximum suppression (非极大值抑制)

In object detection, non-maximum suppression is used to suppress (抑制) overlapping predicted bounding boxes. When different instances overlap more than a given threshold value, only the one with the highest confidence value is kept while the other instances, not having the maximum confidence value, are suppressed.

overfitting (过拟合)

Overfitting happens when the network starts to 'memorize' training data instead of learning how to find general rules for the classification. This becomes visible when the model continues to minimize error on the training set but the error on the validation set increases. Since most neural networks have a huge amount of weights, these networks are particularly prone to overfitting.

regularization (正则化) - hyperparameter 'weight_prior'

Regularization is a technique to prevent neural networks from overfitting by adding an extra term to the loss function. It works by penalizing (惩罚) large weights, i.e., pushing the weights towards zero. Simply put, regularization favors (倾向于) simpler models that are less likely to fit to noise in the training data and generalize better. In HALCON, regularization is controlled via the parameter 'weight_prior'. Alternative names: regularization parameter, weight decay parameter, λ (note that in HALCON we use λ for the learning rate and within formulas the symbol α for the regularization parameter).

retraining (再训练)

We define retraining as updating the weights of an already pretrained network, i.e., during retraining the network learns the specific task. Alternative names: fine-tuning (微调).

solver (求解器)

The solver optimizes the network by updating the weights in a way to optimize (i.e., minimize) the loss.

stochastic gradient descent (SGD) (随机梯度下降法)

SGD is an iterative optimization algorithm for differentiable (可微) functions. In deep learning we use this algorithm to calculate the gradient to optimize (i.e., minimize) the loss function. A key feature of the SGD is to calculate the gradient only based on a single batch containing stochastically (随机) sampled (采样) data and not all data.

top-k error

The classifier infers for a given image class confidences of how likely the image belongs to every distinguished class. Thus, for an image we can sort the predicted classes according to the confidence value the classifier assigned. The top-k error tells the ratio of predictions where the ground truth class is not within the k predicted classes with highest probability. In the case of top-1 error, we check if the target label matches the prediction with the highest probability. In the case of top-3 error, we check if the target label matches one of the top 3 predictions (the 3 labels getting the highest probability for this image). Alternative names: top-k score.

transfer learning (迁移学习)

Transfer learning refers to the technique where a network is built upon the knowledge of an already existing network. In concrete terms this means taking an already (pre)trained network with its weights and adapt the output layer to the respective application to get your network. In HALCON, we also see the following retraining step as a part of transfer learning.

underfitting (欠拟合)

Underfitting occurs when the model over-generalizes (过度概括). In other words it is not able to describe the complexity of the task. This is directly reflected in the error on the training set, which does not decrease significantly.

weights (权重)

In general weights are the free parameters of the network, which are altered during the training due to the optimization of the loss. A layer with weights multiplies or adds them with its input values. In contrast to hyperparameters, weights are optimized and thus changed during the training.