STATS 413

Outcome regression

In this post, we consider the task of estimating treatment effects. This is the basic problem in causal inference, and it arises in a many areas of science and engineering. As a running example, we consider the task of estimating the efficacy of a vaccine booster. We begin by mathematically defining treatment effects using the potential outcomes framework.

To keep things simple, we focus on estimating the effect of a binary treatment (e.g. booster vs no booster). We define two potential outcomes \(Y_i(1)\) and \(Y_i(0)\) for each subject in the study. In the running example, \(Y_i(1)\) is the viral load in the \(i\)-subject if the subject got the booster, and \(Y_i(0)\) is the viral load if the subject did not get the booster. The effect of the treatment on the \(i\)-th subject is

\[\Delta_i \triangleq Y_i(1) - Y_i(0).\]

The fundamental challenge in causal inference is only one treatment can be assigned to a subject, so only one of \(Y_i(1)\) and \(Y_i(0)\) can be observed. Thus \(\Delta_i\) is never observed. Nevertheless, it is possible (as we shall see) to estimate the average treatment effect (ATE)

\(\tau \triangleq \Ex\big[\Delta_i\big] = \Ex\big[Y_i(1)\big] - \Ex\big[Y_i(0)\big]\) by performing randomized experiments.

In a randomized experiment, we randomly assign treatments to the subjects and record the outcomes. Let \(W_i\in\{0,1\}\) and \(Y_i\) be the treatment assignment and observed outcome of the \(i\)-th subject. In the running example, \(W_i\) indicates whether the \(i\)-th subject got the booster and \(Y_i\) is the (observed) viral load in the \(i\)-th subject. Mathematically, in a randomized experiment, we have

\[\begin{aligned} Y_i = Y_i(W_i) && \text{(SUTVA),}\\ (Y_i(1),Y_i(0)) \ind W_i && \text{(random treatment assignment).} \end{aligned}\]

The first condition (SUTVA) relates the observed outcomes to the potential outcomes: the observed outcome of the \(i\)-subject \(Y_i\) is \(Y_i(1)\) (resp \(Y_i(0)\)) if \(W_i = 1\) (resp \(W_i = 0\)). The second condition says treatments are assigned in a way that does not depend on the potential outcomes. It implies the distribution of potential outcomes in the treated and untreated groups are identical:

\[(Y_i(1),Y_i(0)) \mid\{W_i = 1\} \overset{d}{=} (Y_i(1),Y_i(0)) \mid\{W_i = 0\}.\]

In practice, treatments are often assigned randomly (e.g. by flipping a coin) to satisfy this condition.

Difference-in-means

A simple estimate of the ATE in a randomized experiment is the difference between the (sample) mean outcomes in treated and untreated subjects:

\[\def\DM{\text{DM}} \def\htau{\widehat{\tau}} \htau_{\DM} = \frac{1}{n_1}\sum_{i=1}^nY_i\ones\{W_i = 1\} - \frac{1}{n_0}\sum_{i=1}^nY_i\ones\{W_i = 0\},\]

where \(n_w \triangleq \sum_{i=1}^n\ones\{W_i = w\}\) is the number of subjects assigned treatment $w\in{0,1}$. This is called the difference-in-means estimator, and it is motivated by the observation that the (sample) mean outcome in a treatment group is an unbiased estimate of the expected potential outcome in a randomized experiment:

\[\begin{aligned} &\Ex\left[\frac{1}{n_w}\sum_{i=1}^nY_i\ones\{W_i = w\}\right] \\ &\quad= \Ex\big[Y_i\mid W_i = w\big] \\ &\quad= \Ex\big[Y_i(w)\mid W_i = w\big] & & \text{(SUTVA)} \\ &\quad= \Ex\big[Y_i(w)\big] & & \text{(random treatment assignment)}. \end{aligned}\]

In light of this observation, it is not hard to see that the difference-in-means estimator is unbiased:

\[\begin{aligned} \Ex\big[\htau_{\DM}\big] &= \Ex\left[\frac{1}{n_1}\sum_{i=1}^nY_i\ones\{W_i = 1\}\right] - \Ex\left[\frac{1}{n_0}\sum_{i=1}^nY_i\ones\{W_i = 0\}\right] \\ &= \Ex\big[Y_i(1)\big] - \Ex\big[Y_i(0)\big] \\ &= \tau. \end{aligned}\]

We leave as an exercise to show that \(\htau_{\DM}\) is asymptotically normal:

\[\sqrt{n_1 + n_0}(\htau_{\DM} - \tau) \dto N(0,\frac{\sigma_1^2}{\pi_1} + \frac{\sigma_0^2}{\pi_0}),\]

where \(\sigma_w^2 \triangleq \var\big[Y_i(w)\big]\) and \(\pi_w \triangleq \Pr\{W_i = w\}\) for \(w\in\{0,1\}\). This result allows us to form confidence intervals and test hypothesis regarding the ATE.

Posted on December 11, 2021 from Ann Arbor, MI