Project Bella

Cubic Cannibalism

After thinking about how Bella's brain should be structured, I found my thoughts gravitating closer and closer to the design of Black & White's Creature AI: a collection of neural nets to guide its fundamental behavior, and much faster-learning models (like decision trees) to represent its developing feelings about various objects and people. With this in mind, I started training one of the "brains" I had in mind for Bella: a foraging brain. This brain would help Bella find food when she grows hungry, and even help the player help Bella find food, as I don't know if I intend to show the player everything Bella can see.

I first wanted to try training a foraging brain using only scent data, so my first task was to figure out how to represent a scent. After looking through all of the code files related to Ray Perception Sensor Component 3D (the script used as the "eyes" of many an ML-Agents example agent), I realized that these ray casts did not pass any distance or direction information to the agent's brain. They only communicated the tag of the game object they intersected. This didn't seem like a good representation of scent data, so I decided to create one on my own.

I ultimately decided on the following scent-detection system:

  • A scented object is merely a sprite with a tag.
  • The training area holds on to references of all scented objects in the scene (this week, I only had two types of scented objects: "food" and "badFood").
  • Every time the agent collected observations to send to its brain, it would ask the environment to sort the scented objects by proximity to the agent. It would then grab the three nearest scents.
  • These scents were encoded as: an integer value for the tag (-1 for "badFood", 1 for "food), and a Vector3 for the direction from the agent to the scented object.

After much experimentation, I chose to give the agent the option to pick something up when it's colliding with it, rather than automatically picking it up upon collision. So in addition to the scent data, the agent's brain was passed the direction of the agent's velocity, a boolean indicating whether or not it was jumping, and another boolean indicating whether or not it was trying to pick something up.

For the first several hours, I found myself perplexed at how difficult it was to teach a square to go toward food, and away from bad food. Like, how hard could it be? Just GO TOWARD THE FOOD.

I tried changing the number of nodes in the hidden layer from 64 to 128 to 256; I tried allowing the agent to jump versus only allowing it to strafe; I tried penalizing and not penalizing jumping, penalizing and not penalizing moving, including or not including bad food, including only one food in the scene or including four, only allowing the agent to detect one scent or allowing it to detect three, and the list goes on. Time and time again, I found that the agent would ultimately learn to "just go left" or "just go right".

Ultimately I realized that randomly distributing the food and bad food was not a good idea. The agent would quickly learn that it had a 50&percnt chance of grabbing a piece of food. To fix this, I developed the following curriculum:

  • To start off, the bad food would be collected on one side, and the good food would be collected on the other. I constantly calculated a weighted average of the agent's position, and placed the food on the opposite side so that it would be forced to learn that it can't simply go in one direction all the time.
  • As the agent progressed through lessons, the overlap between the bad food zone and the good food zone would gradually increase, forcing the agnet to become more and more precise about moving and picking things up.
  • The agent also faced an increasing jumping penalty with every lesson. This was to inspire jumping only when necessary (a big swatch of bad food to jump over).

After one million iterations of training the agent like this, it performed... okay.

The brown squares are food, the purple squares are poison. The pink square is my disappointment.

I was honestly disheartened to find that even after one million iterations of training, this agent would still occasionally make the mistake of grabbing a poisonous square, or get stuck between two poisonous squares even though a food square was somewhere nearby. Of course, this is due to the information I chose to pass it, and my own design of the training environment. The agent can only be aware of three scents at a time, so even if all four food squares are in the scene, it won't be able to detect them if the three closest scented objects are poisonous squares. I plan to tackle this in the coming week by passing scents to the agent using a probability distribution rather than simply grabbing the three nearest scents. This means the closest scents are most likely to be passed to the agent, but more distant scents could still make their way to the agent's nose by chance. I've also considered giving the agent the option to "focus" on a scent; that is, it could forcibly hold onto the scent that is not among the three nearest scents if it wants to.

I also need to keep in mind that this ultimately has to be an actual game; so in the coming week, I hope to prototype one of the game's primary mechanics: reclaimable control over the titular character. When the main character's emotions are very strong, more effort has to be put in to control it. I've also decided that I should probably avoid any fancy graphics until I have an MVP. If I can get people emotional about a particularly unimpressive square, then surely I'll have no trouble doing the same with a dog.