Chuck Manski and John Pepper have issued a new working paper on the effect of the death penalty on homicide rates. If you’re interested in the issue the paper is of course something you should read, but it’s also a great, readable, and not overly technical exposition by way of example of some of Manski’s work over the last couple of decades on partial identification.
The data are simple: homicide rates across U.S. States in 1975 when there was a national moratorium on the death penalty and 1977, when the death penalty was legal in 32 states (the treatment group) and illegal in other states (the control group).
Most researchers (like, me) would proceed by writing down some parametric, probably linear model and generating a point estimate. If you do that, your estimate will indicate that the death penalty increases the homicide rate by 0.5 per 100,000 population. But that estimate imposes all sorts of structure across states and across time.
To illustrate the other extreme, suppose you’re not willing to assume anything at all about the process assigning treatments to states nor make any homogeneity assumptions at all: the causal effect of the death penalty varies completely arbitrarily across states and years and the death penalty is not assumed to be randomized with respect to other determinants of homicide rates. However, the observed homicide rate is never greater than 32.8 per 100,000, so suppose it’s bounded from above at 35 (note this is actually an assumption—we’d get an ever wider interval by increasing the upper bound). Now in standard notation let Y1 denote the potential outcome in a state if the death penalty is in place and Y0 the potential outcome if there is no death penalty. We can always decompose Y1 as
where D=1 denotes death penalty states. The counterfactual outcome is E[ Y1 | D= 0], what the average homicide rate would be in states without the death penalty if those states did have the death penalty. The data tell us nothing about that conditional expectation. If we just use the fact that it’s between 0 and 35, we have that an upper bound on E(Y1) is the first term in the equation above plus (35)(P(D=0)) and a lower bound is the first term plus (0)(P(D=0)). We can do likewise for E(Y0). We get bounds by considering the extreme cases, for example, the average treatment effect E[ Y1 – Y0 ] can be no greater than the upper bound of E[ Y1 ] minus the lower bound of E [ Y0 ]. Here, that interval is [ -9.6, 25.4 ] in 1975 and [ -19.8, 15.2 ] in 1977. Both of these intervals are necessarily 35 units wide: the data alone tell us very little about the effect of interest.
The authors proceed by going through a battery of ever more restrictive assumptions, showing how adding more and more structure affects the estimates. If we assume that the average causal effect of the death penalty is the same in 1975 and 1977 but nothing else, we find that the average effect is between -9.6 and 15.2, which actually rules out some of the estimates in the literature based on parametric models. Various other assumptions produce estimates with varying signs. The conclusion is that these data alone do not allow us to sign the effect, and different common, plausible assumptions give us answers with different signs.