Student Seminars


Huitian Lei
PhD student, Statistics, University of Michigan

Online Contextual Bandits with Stochastic Policy

The majority of literature in contextual bandits has been focused on learning and planning using deterministic policies. In this project, we design algorithms that learns optimal stochastic policies. Stochastic policies have several advantages when applies to the medical field, including ease of interpretability and capability to avoid habituation to a fixed intervention. We propose an actor-critic algorithm in which the critic iteratively estimates the value function while the actor uses the estimated value function to optimize and update the online policy.


Student Seminar Archive

For questions regarding the Statistics Student Seminar or if you are interested in presenting, please contact Joonha Park( or Jingshen Wang(