Project Bella

Project Joundoom

Although I was eager to look into some of the reinforcement learning methods I had identified last week, such as Q-learning and SARSA, I figured it would be best to first identify what machine learning tools are currently available in Unity, and how powerful they are. I quickly found the Unity ML-Agents Toolkit, a plugin first released in September of 2017. Since then, it has been frequently updated, which I realized only after starting several usage tutorials and finding that my version of ML-Agents had none of the components the instructor was talking about. The most recent tutorial to date shows you how to create an intelligent penguin that retrieves fish for its baby, and although it, too, was created with an earlier version of the plugin, the instructor was kind enough to post an accompanying page with all the necessary changes for later versions.

ML-Agents is easy to set up and get started. For whatever environment you set up, you will require an agent that does the learning, and an academy that I presume manages the learning (since the tutorials don't expand much on the academy, I have yet to fully understand it). The penguin learning environment consists of an icy slab and a shallow pool of water with a few fish. The penguin mother, whom I believe I have aptly named Joundoom[1], goes back and forth between the slab and the water, collecting fish and regurgitating them in front of its baby, perched safely on the slab in wait.

Joundoom was trained using the Proximal Policy Optimization (PPO) learning algorithm. Her brain is a two-layer neural network with 256 hidden units. After ten minutes of attempted motherhood, here are some of her best moments.

  • About a minute into training:
    Truly one of nature's greatest
  • A brief moment in which she asked herself who should really be swimming with the fishes:
    I would lose it too after 35,000 attempts at feeding an incompetent child

The tutorial recommends training for 6 million iterations, but my computer began to protest after a measly 50,000. In the coming weeks, I plan on figuring out whether or not I can continue training from a previous model, and exploring more of ML-Agents. I'd like to try training an agent using Soft-Actor Critic (SAC), another learning algorithm that ML-Agents uses for reinforcement learning which can be more sample-efficient than PPO. I have not been disappointed with ML-Agents so far, and I expect it to become a central tool in my project this semester.

  1. [1] Joundoom is the name I gave my first Club Penguin avatar after my friend convinced me that the name of my favorite pokemon, Houndoom, was in fact spelled with a 'J'.