【原】DL之DilatedConvolutions：Dilated Convolutions(膨胀卷积/扩张卷积)算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

处女座的程序猿 2021-09-28

展开全文

DL之DilatedConvolutions：Dilated Convolutions(膨胀卷积/扩张卷积)算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

Dilated Convolutions算法的简介(论文介绍)

ABSTRACT
State-of-the-art models for semantic segmentation are based on adaptations of convolutional networks that had originally been designed for image classification. However, dense prediction problems such as semantic segmentation are structurally different from image classification. In this work, we develop a new convolutional network module that is specifically designed for dense prediction. The presented module uses dilated convolutions to systematically aggregate multiscale contextual information without losing resolution. The architecture is based on the fact that dilated convolutions support exponential expansion of the receptive field without loss of resolution or coverage. We show that the presented context module increases the accuracy of state-of-the-art semantic segmentation systems. In addition, we examine the adaptation of image classification networks to dense prediction and show that simplifying the adapted network can increase accuracy.
最先进的语义分割模型是基于卷积网络的自适应，而卷积网络最初是为图像分类而设计的。然而，语义分割等密集预测问题在结构上与图像分类不同。在这项工作中，我们开发了一个新的卷积网络模块，专门为密集预测设计。所提出的模组使用扩展卷积来系统地聚合多尺度的上下文信息而不丢失分辨率。该架构基于这样一个事实，即膨胀的卷积支持接收域的指数级扩展，而不会丢失分辨率或覆盖率。结果表明，提出的上下文模块提高了目前最先进的语义分割系统的精度。此外，我们研究了图像分类网络对密集预测的适应性，并证明简化自适应网络可以提高精度。
CONCLUSION
We have examined convolutional network architectures for dense prediction. Since the model must produce high-resolution output, we believe that high-resolution operation throughout the network is both feasible and desirable. Our work shows that the dilated convolution operator is particularly suited to dense prediction due to its ability to expand the receptive field without losing resolution or coverage. We have utilized dilated convolutions to design a new network structure that reliably increases accuracy when plugged into existing semantic segmentation systems. As part of this work, we have also shown that the accuracy of existing convolutional networks for semantic segmentation can be increased by removing vestigial components that had been developed for image classification.
我们研究了用于密集预测的卷积网络架构。由于模型必须产生高分辨率的输出，我们认为整个网络的高分辨率操作是可行的，也是可取的。我们的工作表明，膨胀卷积算子特别适合于密集预测，因为它能够在不损失分辨率或覆盖率的情况下扩展接收域。我们利用扩展卷积设计了一种新的网络结构，当插入现有的语义分割系统时，可以可靠地提高精确度。作为这项工作的一部分，我们还表明，通过去除用于图像分类的残留成分，可以提高现有卷积网络用于语义分割的准确性。
We believe that the presented work is a step towards dedicated architectures for dense prediction that are not constrained by image classification precursors. As new sources of data become available, future architectures may be trained densely end-to-end, removing the need for pre-training on image classification datasets. This may enable architectural simplification and unification. Specifically, end-to-end dense training may enable a fully dense architecture akin to the presented context network to operate at full resolution throughout, accepting the raw image as input and producing dense label assignments at full resolution as output.
我们认为，所提出的工作是朝着不受图像分类前驱体约束的高密度预测专用体系结构迈进的一步。随着新数据源的出现，未来的体系结构可能需要密集的端到端培训，从而无需对图像分类数据集进行预培训。这可能使架构简化和统一成为可能。具体地说，端到端密集训练可能使类似于所述上下文网络的完全密集的体系结构能够以全分辨率运行，接受原始图像作为输入，并以全分辨率生成密集的标签分配作为输出。
State-of-the-art systems for semantic segmentation leave significant room for future advances. Failure cases of our most accurate configuration are shown in Figure 4. We will release our code and trained models to support progress in this area.
最先进的语义分割系统为未来的发展留下了巨大的空间。图4显示了我们最精确配置的故障案例。我们将发布我们的代码和经过培训的模型来支持这一领域的进展。