Constrained Based

UGV Path Planning

Wireless Interacton with IoT Devices Reinforcement Learning



Currently, IoTs have taken the centre stage in the technology world by being one of the fastest growing markets, as it has been predicted that there will be more than 30 billion connected devices by the end of 2020. Futhermore, the amount of data which they are producing is estimated to be 100s of trillion gigabytes per year. In the near future, almost every device will be connected to the internet, ranging from sensors, vehicles, wearable electronics to other embedded systems like refrigerators. This tremendous reliance on IoT devices, generates a situation where we have to find efficient ways to communicate with them as well as charge them, specifically in the case of tiny IoT devices like an RFID or Bluetooth. One one hand, using a traditional method like battery is not a viable option for miniscule sized IoT devices. On the other hand, charging cables are not suitable, as it is not only expensive to purchase them in abundance considering each device, but also not practical for inaccessible areas. Henceforth, this project proposes the deployment of an unmanned ground vehicle in designated areas to wirelessly charge and collect data from clusters of tiny IoT devices in an operation area as shown below.



The objective is to explore different methods like Multi Integer Non-Linear Programming (as lower bound), Q-Learning and Deep Reinforcement Learning (deep Q-Learning) in order to plan the path of an unmanned ground vehicle so that it can charge the devices, meanwhile optimising both the energy consumed by it and the total path taken. Results from the above methods have been included and compared. All of the above methods are compared extensively on the basis of their efficiency and speed, and ultimately the one which gives the best result in a real world environment is chosen.




Dr. C. Wu
PhD Toronto Associate Head Associate Professor, HKU



Anushka Vashishtha
Year 4 CS Student, HKU


Project Schedule

September 30

Deliverable of Phase 1

  • Project Plan
  • Project Website


Working into MATLAB, Python and TensorFlow

Reading up on MINLP and Reinforcement Learning methods like Q-learning, dyna-Q and deep Q Learning

November - December

Development of demo application

  • Creating simulated environment
  • Applying MINLP and Q-Learning

Early January

Deliverable of Phase 2

  • Demo Application
  • Interim Report

Mid-January - February

Applying Deep Reinforcement Learning to large state space (40x40 grid size)

March - April

Comparing the results obtained from the different approaches


Deliverable of Phase 3

  • Finalized Implementation
  • Finalized Report

UGV learning path in a simulation

Comparison of methods as outlined in the final report, showed that Q-learning should be used for robot path planning in case the operation area is small and when there is not enough time for training the UGV. Moreover, dyna-Q should be incorporated with Q-learning in cases where no information about the model is present as dyna-Q uses model learning before training. In all the other cases, deep Q-learning should be used because it supports a large grid size and is more efficient in achieving an optimum path with low energy consumption. Finally, the results from deep Q-learning are also optimum for real world application as they are quite close to the lower bound provided by MINLP. Therefore, the deployment of such a UGV for wirelessly charging (and communication with) IoT devices is feasible. The algorithm presented in this Project can also be used to create a simulated environment so that the UGV can train for large episodes without additional costs incurred by physical interaction with the environment. After sufficient training, the UGV will have required knowledge before deployment. This will not only make cables obsolete but also will play a big role in data collection and charging in various sectors ranging from manufacturing to retail.