This week, I did what I've done pretty much every week and didn't do anything close to what I said I'd do last week. I took my professor's suggestion to try and fashion one of the existing Unity ML-Agents examples into a game, and I believe I'm making decent progress. I took the Food Collector example since it was so similar to my forager agent (but in 3D!) and messed around with it. In the original example, the agents were capable of picking up food and poison, and temporarily freezing other agents with a laser. I modified it so that each agent is now a sort of combination of a penguin agent (tbt my girl Joundoom) and a foraging agent:
Each of these agents receive the following observations:
Finally, each agent received the following rewards and punishments (on the first attempt):
I trained sixteen of these agents for 20 thousand iterations, and by the end of it they were grabbing food and avoiding poison with precision. Even so, they still weren't returning to their nests to drop the food off. They were also becoming aggressive as much as possible because it made them move faster. Since energy and food were connected during this run, becoming aggressive would burn both energy and food, allowing them to collect more food and causing the agent score to skyrocket despite the fact that they weren't performing the crucial step of returning to the nest. On the next attempt, I created a very small punishment for becoming aggressive, and did not reset the energy when the round resets. With these modifications, they STILL weren't returning to the nest, but they weren't becoming aggressive nearly as frequently (they would only use it as a small boost to grab a piece of food right in front of them). They also started displaying some SUPER cute behavior, like stopping, looking around, and charging.
Finally, I added a curriculum and a stronger penalty for getting touched by an aggressive agent: they get returned to their nest, temporarily frozen, AND they lose all their food. I also decoupled food and energy, so an agent can now lose energy without losing food, but still gain energy if they eat food. This forces them to have to return to their nest if they want to keep collecting food and increase their energy. The curriculum I used modulated two parameters: "nest radius" and "max food". Nest radius determines how close an agent has to be to their nest for food drop off to occur, and max food determines how much food they can hold at once. Nest radius gradually decreases while max food gradually increases. Here is the curriculum I ultimately used:
200 thousand iterations later (an hour and fifteen minutes), they were finally returning to the nest:
This game is playable if the agent's behavior parameters inferencing mode is changed from default to heuristic, but it is obviously not polished to play yet. The AIs are formidable and play like a very focused human, but I don't think the game is engaging enough yet. These agents clearly function as challenging and somewhat sympathetic (they're really cute) AIs with a complex machine learning model for a brain. I have an idea of exactly what I want to do for next week, but I won't reveal it yet; it has to do with themeing for the game for which I have very specific inspiration that I'm excited to try out.