My Dissertation
Below is a video of 30 robots flocking around 50 objects to reach a desired target. These robots are trained using Reinforcment Learning. Specifically, they are trained using the Independent Proximal Policy Optimisation (IPPO) algorithm, using multi-layer perceptron policies and value functions. More details will be added to this page shortly, (once I actually finish writing my dissertation...)