The development and subsequent deployment of autonomous ground vehicles is a popular instantiation of achieving perfect sensorimotor control in 3D environments. It is a perception-driven control task, with one of the most difficult scenarios being navigating in densely populated urban environments, primarily because of but not limited to

  • complex multi-agent dynamics involving other vehicles, pedestrians, and static objects,

  • the need to adhere to traffic rules,

  • a long tail of rare events, and

  • the need to make instantaneous decisions in the presence of conflicting objectives.

While research on autonomous urban driving is also hindered by “the logistical difficulties of training and testing systems in the physical world”, simulated environments may provide a solution to this by enabling massive parallelism in training and evaluation in addition to modeling scenarios that are either very rare, expensive, risky, or tedious to achieve in the real world. Existing options in this regard are either

  • open-source racing simulators that do not sufficiently model the complexities associated with urban driving or

  • commercial games that do not support benchmarking of driving policies and offer limited options for customization. This paper introduces CARLA (Car Learning to Act), an open-source simulator designed exclusively for training and evaluating autonomous driving policies in customizable urban environments.

Built upon the Unreal Engine 4 (UE4) which allows for customizability and extensions, CARLA is designed as a client-server system, where the server is responsible for scene rendering and running the simulation, while the client uses a Python API to send (a) commands to control the vehicle and (b) meta-commands to control the simulation, the environment, and the sensor suite. Two towns have been released, with one used for training and one for testing, and they contain assets from a library containing 40 buildings, 16 vehicle models, and 50 pedestrian models, along with 18 possible illumination-weather combinations. The realistic animation of non-player characters are taken care of by the UE4 models and a basic controller is implemented to govern the non-player vehicles. Vehicles and pedestrians have the capability to detect and avoid each other, and pedestrians are encouraged to walk along the sideways, but they are permitted to cross roads at any time. If a pedestrian collides with a vehicle, they are removed from the simulation and replaced with another pedestrian at a different location after some time. CARLA provides access to the agent’s sensors which include its 3D location, 3D orientation, speed, acceleration, impact from collisions, its field of view and depth of field, with semantic segmentation maps (containing labels for 12 classes) and depth maps, along with bounding boxes and exact locations of all dynamic objects in the environment. The authors train and evaluate 3 autonomous driving approaches: (a) a modular pipeline (MP), (b) conditional imitation learning (IL), and (c) deep reinforcement learning (RL).

  • (a) MP uses a RefineNet to estimate navigable regions and obstacles, an AlexNet to detect intersections, a local planner based on a state machine to synthesize waypoints, and a continuous proportional-integral-derivative (PID) controller to drive the car at a cruise speed of 20 kmph.

  • (b) IL is trained on, in addition to perceptual input, 14 hours of driving traces obtained from human drivers in the training town, and a deep network predicts the action given the observation and one out of possible 4 commands.

  • (c) Finally, RL is trained using a reward function and without any human driving data, with the reward being a weighted sum of 5 terms which collectively encourage accurate driving and penalize infractions such as collisions or veering from lanes.

These 3 approaches are evaluated on 4 different tasks of increasing difficulty and in different towns and weather conditions. None of the approaches achieve perfect performance on even the simplest task, and the authors attribute this to the lack of generalization while training, which is also supported by the relatively poor performance of the best methods (MP and IL) on the most difficult task. Testing directly for model generalization reveals that models generalize better to new weather conditions than new environments (Town 2), and this can be explained by the fact the latter contains different textures and 3D object models than the environment the models were trained on (Town 1). MP performs better than learning-based approaches (IL and RL) in a different weather, but worse in new town and new weather because the perception system’s sub-optimal outputs affect the MP’s motion planner, leading to a degraded performance, demonstrating that MP is more fragile than end-to-end learning based approaches. Between IL and RL, RL underperforms despite being trained with 12 days of driving data (compared to 14 hours of data for IL) because it needs an extensive task-specific hyperparameter search, which is infeasible for large tasks like autonomous driving, and as such driving might be a more difficult task for RL as compared to others which RL has excelled at. Finally, infraction analysis shows that all models perform better in Town 1 and among them, MP tends to collide with non-player characters least frequently, thus exposing the susceptibility of learning-based approaches to such rare events.

This paper proposes an open-source autonomous driving simulator CARLA and compares the performance of 3 popular autonomous driving approaches. The availability of such a unified simulation, experimentation, and benchmarking system is crucial to advancements in self-driving research, and this paper is a milestone in that regard. It is written very clearly, with all system and experimentation details explicitly mentioned, and is therefore easy to read and follow. The authors admit that they have some ideas for improvement which they plan to work on:

  • allowing more fine-grained control of non-player characters,

  • improved RL training with data augmentation and techniques such as dropout for better generalization performance, and

  • evaluation using more model architecture and learning algorithms.

Since its release 3 years ago, CARLA has grown considerably to now include 10 towns (from the original 2), a larger sensor suite, a more customizable traffic simulation, possibility to trade-off between rendering quality and speed, and inter-operability on multiple platforms, in addition to a public challenge leaderboard for researchers to submit their own autonomous driving systems to for evaluation. ‘Learning by Cheating’ (Chen et al., CoRL 2019) is an interesting and related work where the driving task is decomposed into two stages, where the first stage learns a policy by imitating expert demonstrations, and the second stage learns to estimate the agent from the first stage.

This summary was written in Fall 2020 as a part of the CMPT 757 Frontiers of Visual Computing course.