The Optimal Level of Experimentation
by Giuseppe Moscarini and Lones Smith
(Econometrica, 11-2001, pp. 1629--1644)

Bandit models are the bread-and-butter of the optimal Bayesian experimentation literature. In bandit models, the cost of acquiring information is purely implicit: namely foregone opportunities. Yet so much of actual R&D is purely explicit cost experimentation. It is surprising that there has to date been no canonical learning model with this obvious feature. Indeed, we acquire a vast amount of information in life by paying for it with currency. This paper introduces a simple optimal experimentation model with purely explicit cost experimentation. If you have a model of R&D, you definitely should consider using this simple paradigm.

To do so, we go back in time to the grand-daddy of all experimentation models. We revisit Wald's famous sequential experimentation 1947 story, upon which dynamic programming and later the optimal experimentation literature is really founded. Wald's problem was a pure optimal stopping exercise, and by now rather rote: continue to buy information at a given rate until you hit one of two stopping regions. We wish to endogenize this information acquisition level, by way of an increasing cost function for different levels of experimentation. But by itself, that makes for a trivial variation, since the optimal rule is then to pick the lowest average cost and proceed at that level. For instance, with a convex cost function, this might mean a very low experimentation level. We need some reason for the experimenter to hurry up. We draw inspiration from the real world: the decision maker is impatient. Thus, we assume that an impatient decision maker runs variable-size experiments at an increasing, strictly convex cost before choosing an irreversible action. Notice that the forces of impatience and convex information costs work at cross purposes. Fleshing out this interplay is at the heart of our paper.

This paper makes one other critical modelling innovation. For the discrete time version of Wald's model is somewhat intractable. We instead formulate a tractable continuous time limit version of the sequential information acquisition problem: controlling the level of experimentation becomes controlling the variance of a diffusion of unknown drift. The state of the world is the drift, either high or low in the initial simple model. To halve the diffusion variance, and thereby learn more about the state, corresponds exactly to doubling the experimentation level. Usually one looks for explicit solutions of stochastic calculus problems. We in fact solve this model by indirect means, without producing any closed form. We hope this technique will be useful elsewhere.

We (a) prove that the optimal experimentation level is an increasing function of the Bellman value. This general finding allows us to quickly deduce some new testable implications, like (b) the experimentation cost time series has a secular upward drift in almost all cases; and (c) assuming a more impatient decision maker experiments at all, he experiments at a higher intensity, given lump-sum final payoffs. This prediction is new to the experimentation literature: In any implicit cost experimentation setting, a more impatient decision maker always experiments less. Finally, we show that our predictions (a) and (b) are robust to countable-state and continuum-state normal models. There, we also extend an R&D interpretation of the model, where experimentation is monotonic not only in the Bellman value, but also in beliefs.

Download pdf.