Department of Statistics
439 West Hall, 1085 South University Ave., Ann Arbor, MI 48109-1107
Phone: 734.763.3519Fax: 734.763.4676
Huitian Lei
PhD student, Statistics, University of Michigan
Online Contextual Bandits with Stochastic Policy
The majority of literature in contextual bandits has been focused on learning and planning using deterministic policies. In this project, we design algorithms that learns optimal stochastic policies. Stochastic policies have several advantages when applies to the medical field, including ease of interpretability and capability to avoid habituation to a fixed intervention. We propose an actor-critic algorithm in which the critic iteratively estimates the value function while the actor uses the estimated value function to optimize and update the online policy.
For questions regarding the Statistics Student Seminar or if you are interested in presenting, please contact Joonha Park(joonhap@umich.edu) or Jingshen Wang(jshwang@umich.edu).