【论文速递2-14】自动驾驶方向优质的论文及其代码

InfoRich 2022-02-14

展开全文

Autonomous Vehicles - 自动驾驶

1. 【Autonomous Vehicles】Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes

【自动驾驶】城市场景语义分割的课程域自适应

作者：Yang Zhang, Philip David, Boqing Gong

链接：

https:///abs/1707.09465v5

代码：

https://github.com/YangZhang4065/AdaptationSeg

英文摘要：

During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data cripples the models' performance. Hence, we propose a curriculum-style learning approach to minimize the domain gap in urban scenery semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and two backbone networks. We also report extensive ablation studies about our approach.

中文摘要：

在过去的五年中，卷积神经网络(CNN)战胜了语义分割，这是自动驾驶等许多应用的核心任务之一。然而，训练CNN需要大量的数据，这些数据难以收集且难以标注。计算机图形学的最新进展使得使用计算机生成的注释在照片般逼真的合成图像上训练CNN成为可能。尽管如此，真实图像和合成数据之间的域不匹配会削弱模型的性能。因此，我们提出了一种课程式学习方法，以最小化城市景观语义分割中的领域差距。课程领域适应首先解决了简单的任务，以推断目标领域的必要属性；特别是，第一个任务是学习图像上的全局标签分布和地标超像素上的局部分布。这些很容易估计，因为城市场景的图像具有很强的特性（例如，建筑物、街道、汽车等的大小和空间关系）。然后，我们训练一个分割网络，同时规范其在目标域中的预测，以遵循这些推断的属性。在实验中，我们的方法在两个数据集和两个骨干网络上优于基线。我们还报告了有关我们方法的广泛消融研究。

2. 【Autonomous Vehicles】Fast Scene Understanding for Autonomous Driving

【自动驾驶】自动驾驶的快速场景理解

作者：Davy Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans, Luc Van Gool

链接：

https:///abs/1708.02550v1

代码：

https://github.com/davyneven/fastSceneUnderstanding

英文摘要：

Most approaches for instance-aware semantic labeling traditionally focus on accuracy. Other aspects like runtime and memory footprint are arguably as important for real-time applications such as autonomous driving. Motivated by this observation and inspired by recent works that tackle multiple tasks with a single integrated architecture, in this paper we present a real-time efficient implementation based on ENet that solves three autonomous driving related tasks at once: semantic scene segmentation, instance segmentation and monocular depth estimation. Our approach builds upon a branched ENet architecture with a shared encoder but different decoder branches for each of the three tasks. The presented method can run at 21 fps at a resolution of 1024x512 on the Cityscapes dataset without sacrificing accuracy compared to running each task separately.

中文摘要：

大多数实例感知语义标签的方法传统上都侧重于准确性。运行时间和内存占用等其他方面可以说对于自动驾驶等实时应用程序同样重要。受这一观察的启发，并受到最近使用单一集成架构处理多个任务的工作的启发，在本文中，我们提出了一种基于ENet的实时高效实现，它同时解决了三个与自动驾驶相关的任务：语义场景分割、实例分割和单目深度估计。我们的方法建立在一个分支的ENet架构上，该架构具有一个共享的编码器，但三个任务中的每一个任务都有不同的解码器分支。与单独运行每个任务相比，所提出的方法可以在Cityscapes数据集上以1024x512的分辨率以21fps的速度运行，而不会牺牲准确性。

3. 【Autonomous Vehicles】Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues

【自动驾驶】深度转向：从空间和时间视觉线索中学习端到端驾驶模型

作者：Lu Chi, Yadong Mu

链接：

https:///abs/1708.03798v1

代码：

https://github.com/abhileshborode/Behavorial-Clonng-Self-driving-cars

英文摘要：

In recent years, autonomous driving algorithms using low-cost vehicle-mounted cameras have attracted increasing endeavors from both academia and industry. There are multiple fronts to these endeavors, including object detection on roads, 3-D reconstruction etc., but in this work we focus on a vision-based model that directly maps raw input images to steering angles using deep networks. This represents a nascent research topic in computer vision. The technical contributions of this work are three-fold. First, the model is learned and evaluated on real human driving videos that are time-synchronized with other vehicle sensors. This differs from many prior models trained from synthetic data in racing games. Second, state-of-the-art models, such as PilotNet, mostly predict the wheel angles independently on each video frame, which contradicts common understanding of driving as a stateful process. Instead, our proposed model strikes a combination of spatial and temporal cues, jointly investigating instantaneous monocular camera observations and vehicle's historical states. This is in practice accomplished by inserting carefully-designed recurrent units (e.g., LSTM and Conv-LSTM) at proper network layers. Third, to facilitate the interpretability of the learned model, we utilize a visual back-propagation scheme for discovering and visualizing image regions crucially influencing the final steering prediction. Our experimental study is based on about 6 hours of human driving data provided by Udacity. Comprehensive quantitative evaluations demonstrate the effectiveness and robustness of our model, even under scenarios like drastic lighting changes and abrupt turning. The comparison with other state-of-the-art models clearly reveals its superior performance in predicting the due wheel angle for a self-driving car.

中文摘要：

近年来，使用低成本车载摄像头的自动驾驶算法吸引了学术界和工业界越来越多的努力。这些努力有多个方面，包括道路上的对象检测、3-D重建等，但在这项工作中，我们专注于基于视觉的模型，该模型使用深度网络将原始输入图像直接映射到转向角。这代表了计算机视觉领域的一个新兴研究课题。这项工作的技术贡献是三方面的。首先，在与其他车辆传感器时间同步的真实人类驾驶视频上学习和评估模型。这与许多从赛车游戏中的合成数据训练的先前模型不同。其次，最先进的模型，如PilotNet，大多独立地预测每个视频帧上的车轮角度，这与将驾驶作为有状态过程的普遍理解相矛盾。相反，我们提出的模型结合了空间和时间线索，共同研究瞬时单目相机观察和车辆的历史状态。这实际上是通过在适当的网络层插入精心设计的循环单元（例如LSTM和Conv-LSTM）来实现的。第三，为了促进学习模型的可解释性，我们利用视觉反向传播方案来发现和可视化对最终转向预测产生关键影响的图像区域。我们的实验研究基于Udacity提供的大约6小时的人类驾驶数据。全面的定量评估证明了我们模型的有效性和稳健性，即使在剧烈的照明变化和突然转弯等场景下也是如此。与其他最先进模型的比较清楚地表明，它在预测自动驾驶汽车的应有车轮角度方面具有卓越的性能。

4. 【Autonomous Vehicles】Free Space Estimation using Occupancy Grids and Dynamic Object Detection

【自动驾驶】使用占用网格和动态对象检测的自由空间估计

作者：Raghavender Sahdev

链接：

https:///abs/1708.04989v1

代码：

https://github.com/raghavendersahdev/Free-Space

英文摘要：

In this paper we present an approach to estimate Free Space from a Stereo image pair using stochastic occupancy grids. We do this in the domain of autonomous driving on the famous benchmark dataset KITTI. Later based on the generated occupancy grid we match 2 image sequences to compute the top view representation of the map. We do this to map the environment. We compute a transformation between the occupancy grids of two successive images and use it to compute the top view map. Two issues need to be addressed for mapping are discussed - computing a map and dealing with dynamic objects for computing the map. Dynamic Objects are detected in successive images based on an idea similar to tracking of foreground objects from the background objects based on motion flow. A novel RANSAC based segmentation approach has been proposed here to address this issue.

中文摘要：

在本文中，我们提出了一种使用随机占用网格从立体图像对估计自由空间的方法。我们在著名的基准数据集KITTI上的自动驾驶领域这样做。稍后基于生成的占用网格，我们匹配2个图像序列来计算地图的顶视图表示。我们这样做是为了映射环境。我们计算两个连续图像的占用网格之间的转换，并使用它来计算顶视图地图。讨论了映射需要解决的两个问题-计算地图和处理用于计算地图的动态对象。基于类似于基于运动流从背景对象跟踪前景对象的想法，在连续图像中检测动态对象。这里提出了一种新的基于RANSAC的分割方法来解决这个问题。

5. 【Autonomous Vehicles】Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions

【自动驾驶】争论机器：人类对黑匣子人工智能系统的监督，这些系统会做出至关重要的决定

作者：Lex Fridman, Li Ding, Benedikt Jenik, Bryan Reimer

链接：

https:///abs/1710.04459v2

代码：

https://github.com/scope-lab-vu/deep-nn-car

英文摘要：

We consider the paradigm of a black box AI system that makes life-critical decisions. We propose an 'arguing machines' framework that pairs the primary AI system with a secondary one that is independently trained to perform the same task. We show that disagreement between the two systems, without any knowledge of underlying system design or operation, is sufficient to arbitrarily improve the accuracy of the overall decision pipeline given human supervision over disagreements. We demonstrate this system in two applications: (1) an illustrative example of image classification and (2) on large-scale real-world semi-autonomous driving data. For the first application, we apply this framework to image classification achieving a reduction from 8.0% to 2.8% top-5 error on ImageNet. For the second application, we apply this framework to Tesla Autopilot and demonstrate the ability to predict 90.4% of system disengagements that were labeled by human annotators as challenging and needing human supervision.

中文摘要：

我们考虑了一个黑盒人工智能系统的范式，它可以做出至关重要的决定。我们提出了一个“争论机器”框架，它将主要的AI系统与经过独立训练以执行相同任务的辅助系统配对。我们表明，在没有任何底层系统设计或操作知识的情况下，两个系统之间的分歧足以在人工监督分歧的情况下任意提高整体决策管道的准确性。我们在两个应用中演示了该系统：（1）图像分类的说明性示例和（2）大规模真实世界的半自动驾驶数据。对于第一个应用，我们将此框架应用于图像分类，将ImageNet上的top-5错误从8.0%减少到2.8%。对于第二个应用，我们将此框架应用于Tesla Autopilot，并展示了预测90.4%的系统脱离的能力，这些脱离被人工注释者标记为具有挑战性且需要人工监督。