Autonomous Drifting using Reinforcement Learning
Project Overview
Hydroplaning refers to a situation which occurs when a layer of water builds between car tires and the road. The tires of the car encounter more water than they can scatter, which results in the loss of traction. This can cause the car to slip or drift, leading to loss of control and many accidents. According to the United States’ Department of Transportation, wet and icy roads are responsible for 19% of all vehicle crashes, and 86% of all weather-related crashes in the United States.
Control algorithms in current self-driving car systems (ABS, ESC etc.) try and mitigate the chances of slipping due to its unpredictable nature. Sufficiently lowering the speed of the car and making turns that are not too tight will mostly prevent slipping, but this does not consider cases where the system must make evasive moves (in which case the speed and turn angle will most likely be sharp) or when a car is already in a slipping state due to the driver’s fault. To ensure that these systems are as robust and safe as possible, it is paramount to study drifting, and eventually deduce how systems can respond quickly to unintentional states.
Our project is aimed at studying drifting. To do so, we use Reinforcement Learning (RL), that learns control policies from trial-and-error, much like how humans learn and solve various problems by interacting with the environment. We explored two ways of obtaining a drift controller. One is a model-free approach with deep Q networks, which makes no assumption about the dynamics of the car and does not even attempt to learn it. The other is a model-based policy search with PILCO, which first builds a gaussian process model for the forward dynamics of the car before searching for a policy.
After obtaining the drift controllers with both the approaches on a simulated car, we made further attempts to prove the effectiveness of the controllers learned. If they are to be applied to the real world, it is imperative that the controllers can adapt to physical conditions that are different from which they were trained on. Results of experiments conducted during the project, such as varying surface friction and chassis mass, are presented in the final reports.
Project Objective
The objective of this project is to study a real world problem of high speed cars skidding when trying to turn during rain/wet roads. The area of drifting falls into two categories – sustained drift and transient drift. Due to the vast breadth of the two categories, our project will mainly focus on sustained drift, and more specifically steady state circular drift. We are only looking to handle the ‘studying’ aspect of drifting through this project - by teaching the car how to drift autonomously, we ensure that the car is able to understand the concept of drifting, and hence cope with the unpredictability that is inherent to the drifting state, as described above.
Project Results
The video shows the sustained circular drift obtained by the remote controlled car in the Gazebo simulator using the PILCO algorithm only after 15 episodes of training, each of 10 seconds long. We explored using a DQN model with a double dueling architecture to get this sustained circular drift controller. The PILCO algorithm is much more data efficient than the DQN model since it takes a model based approach. In addition, although we acheived some success with the DQN model, the PILCO algorithm outperforms it in the robustness tests, which is further discussed in the final reports for the project.
Project Schedule
-
Week 1 - 4 (September 1 - 30, 2017)
- Research on project: Gaussian processes, PILCO, Reinforcement learning, Simulation-aided reinforcement learning, Inverse reinforcement learning, RBF networks.
- Create project plan.
-
Week 5 - 6 (October 1 - 15, 2017)
- 1st to 7th: Figure out how to work with an Arduino.
- 8th to 15th: Add an IMU to the car and understand the API to communicate with it.
-
Week 7 - 9 (October 16 - 29, 2017)
- 16th to 22nd : Get familiar with ROS and OpenAI Gym.
- 23rd to 29th : Integrate with Gazebo and learn to define robots and worlds.
-
Week 10 (October 30 - November 5, 2017)
- Create the simulation environment in Gazebo.
- Get more familiar with PILCO.
-
Week 11 - 12 (November 6 - 19, 2017)
- Implement the Deep Q-Networks (DQN) algorithm.
-
Week 13 (November 20 - 26)
- Buffer for overruns in plan.
- Start first iteration on the simulator by end of week to obtain initial simulated optimal policy.
-
Week 14 - 17 (November 27 - December 24, 2017)
- Preparation for exams
- Transfer optimal policy learnt from simulator to physical car.
-
Week 18 - 19 (December 25, 2017 - January 7, 2018)
- Obtain data from the physical car and start another iteration on the simulator.
- Buffer for overruns in plan.
- Complete interim report.
-
Week 20 - 22 (January 8 - 21, 2018)
- Iterate on implementation of the DQN algorithm.
- Presentation on January 12th, 2018.
- Deliverable on January 21st, 2018.
-
Week 23 (January 22 - 28, 2018)
- Implement PILCO algorithm.
-
Week 24 - 28 (January 29 - February 25, 2018)
- Further iterations of the learning process.
- Try out the results on the physical RC car.
- Perform robustness and stability tests.
-
Week 29 - 31 (February 26 - March 11)
- Further testing and finalize implementation.
- Record finalized demo at the end of week 31.
-
Week 32 - 36 (March 12 - April 15, 2018)
- Work on final report.
- Buffer for overruns in plan.