Project Bella

Path Following Sadness

This week was also fairly busy, but I managed to set up the path following agent and environment and train it in various ways. The agent I currently have is passed the following observations:

  • Information from eleven raycast acting as its eyes:
    • A one-hot encoding of the tag the raycast hit (there are four observable tags, and the fifth spot in the encoding corresponds to a miss)
    • The direction to the object hit by the raycast
    • The distance to the object hit by the raycast
  • The direction the agent is facing (its normalized velocity)
  • Whether or not the agent is trying to jump
  • Whether or not the agent is on the ground
  • Whether or not the agent has reached the destination
  • Whether or not the agent has changed directon

These totaled 105 observations. Every time the agent encountered the trigger collider of a path node along the user-drawn path, it received a small reward of 0.1. If it encountered the destination trigger, it received a much larger reward and continued to be rewarded in tiny amounts as long as it stayed in contact with the destination trigger.

Information like the direction of the raycast and the distance to the hit object is not normally passed to the agent using ML-Agents' RayPerceptionSensorComponent2D; all that component passes are the one-hot encodings. So I ripped some of the code from that component and modified it to pass all of the desired information, hoping it would allow the agent to be more precise about its choices.

Needless to say, as with most things in this project, it did not work.

Perhaps I needed to train each agent for more time before getting frustrated and giving up (which would typically occur after 100 thousand iterations) or change the layout of the environment as it's very congested right now and randomly jumping around can achieve more than it should. Whatever the case, I glanced at some other ML-Agents examples that I thought were similar to mine to see how the agent was being rewarded and punished.

The wall-jump example tries to train an agent to jump over a wall and get to a goal. It can jump on top of a block and push it to the wall to assist it in jumping over. Looking at this agent's script, I was surprised to find that it was just using a RayPerceptionSensorComponent3D for its eyes (i.e., the only "visual" data it received were the tags of the objects in front of it), and its rewards were incredibly sparse: it was only rewarded upon reaching the goal, and at no other time. And yet it was performing really well.

My goal for spring break is pretty much the same as it was this week: to get a working path-following agent. I'll try tweaking the environment, searching through the provided ML-Agents examples for hints, looking through online tutorials with similar goals to mine, and allowing my agent to train for much more time (honestly, its hidden layers had 512 nodes each AND it was a recurrent neural network; the likelihood of it learning quickly in 100 thousand iterations was pretty low if the wall-jump had a significantly simpler brain and needed a million iterations to learn).