This week was also fairly busy, but I managed to set up the path following agent and environment and train it in various ways. The agent I currently have is passed the following observations:
These totaled 105 observations. Every time the agent encountered the trigger collider of a path node along the user-drawn path, it received a small reward of 0.1. If it encountered the destination trigger, it received a much larger reward and continued to be rewarded in tiny amounts as long as it stayed in contact with the destination trigger.
Information like the direction of the raycast and the distance to the hit object is not normally passed to the agent using ML-Agents' RayPerceptionSensorComponent2D; all that component passes are the one-hot encodings. So I ripped some of the code from that component and modified it to pass all of the desired information, hoping it would allow the agent to be more precise about its choices.
Needless to say, as with most things in this project, it did not work.
Perhaps I needed to train each agent for more time before getting frustrated and giving up (which would typically occur after 100 thousand iterations) or change the layout of the environment as it's very congested right now and randomly jumping around can achieve more than it should. Whatever the case, I glanced at some other ML-Agents examples that I thought were similar to mine to see how the agent was being rewarded and punished.
The wall-jump example tries to train an agent to jump over a wall and get to a goal. It can jump on top of a block and push it to the wall to assist it in jumping over. Looking at this agent's script, I was surprised to find that it was just using a RayPerceptionSensorComponent3D for its eyes (i.e., the only "visual" data it received were the tags of the objects in front of it), and its rewards were incredibly sparse: it was only rewarded upon reaching the goal, and at no other time. And yet it was performing really well.
My goal for spring break is pretty much the same as it was this week: to get a working path-following agent. I'll try tweaking the environment, searching through the provided ML-Agents examples for hints, looking through online tutorials with similar goals to mine, and allowing my agent to train for much more time (honestly, its hidden layers had 512 nodes each AND it was a recurrent neural network; the likelihood of it learning quickly in 100 thousand iterations was pretty low if the wall-jump had a significantly simpler brain and needed a million iterations to learn).