Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

Wendy Shang    Alex Trott*    Stephan Zheng*    Caiming Xiong    Richard Socher

An autonomous agent often counters various tasks within a single complex environment. Our two-stage framework proposes to first build a simple directed weighted graph abstraction over the world in an unsupervised task-agnostic manner and then to accelerate the hierarchical reinforcement learning of a diversity of downstream tasks. Details please refer to Paper with Appendix.

Overview of Proposed Two-Stage Framework

Stage 1: World Graph Discovery

Stage 2: Hierarchical Reinforcement Learning

Control Experiment Results

Compare All Variations and Baseline on Small MultiGoal

A2C

FN

All Feasible States

Random States

Pivotal States

Random States

Pivotal States

Random States

Pivotal States

A2C

FN

Proposed

Compare Different Wide Goal Sets on Medium Door-Key

All Feasible States

Pivotal States

Random States

Compare Initialization with Goal-Conditioned Policy on Large MultiGoal-Sparse

with Goal-Conditioned Policy Initialization

without Goal-Conditioned Policy Initialization

with Goal-Conditioned Policy Initialization

without Goal-Conditioned Policy Initialization