## Remarks on Chen and Pearl on causality in econometrics textbooks

Bryant Chen and Judea Pearl have published a interesting piece in which they critically examine the discussions (or lack thereof) of causal interpretations of regression models in six econometrics textbooks. In this post, I provide brief assessments of the discussion of causality in nine additional econometrics texts of various levels and vintages, and close with a few remarks about causality in textbooks from the perspective of someone who does, and teaches, applied econometrics. Like Chen and Pearl, I find some of these textbooks provide weak or misleading discussion of causality, but I also find one very good and one excellent discussion in relatively recent texts. I argue that the discussion of causality in econometrics textbooks appears to be improving over time, and that the oral tradition in economics is not well-reflected in econometrics textbooks.

The Chen and Pearl paper has been around for a while in working paper form and recently came out in the Real World Economics Review, also available here from the authors with much clearer typesetting.

The additional textbooks I discuss below are: Amemiya (1985), Kmenta (1986), Davidson and MacKinnon (1993), Gujarati (1999), Hayashi (2000), Wooldridge (2002), Davidson and MacKinnon (2004), Deilman (2005), and Cameron and Trivedi (2005).

The Issue: Causality in regression models.

A scientist is attempting to understand the relationship between, say, health and smoking. Let y denote some measure of health and let x denote a measure of smoking intensity, say, number of cigarettes smoked per day. A simple model for health supposes the two outcomes are related by,

$y = \beta x + u$.

In short, Chen and Pearl consider these issues: how do econometrics textbooks clearly explain what the parameter $\beta$ means in this model, are they consistent in that interpretation, and generally how well are issues of causality addressed?

That simple-looking equation is much trickier than it appears, as first formally discussed in the econometrics literature by Trygve Haavelmo during the Second World War. For recent discussions, see for example Heckman (2005, 2008), Heckman and Pinto (2013), or blog discussions such as on Pearl’s blog or Andrew Gelman’s blog (note comments from Pearl and from Guido Imbens). First suppose we define the random variable u as the difference between y and its conditional expectation:

$u \equiv y - E[ y | x]$,

then it is easy to show that the error term must be mean-independent of $x$. In econometric jargon, we obtain exogeneity by definition. In this interpretation, the parameter $\beta$ is implicitly defined through,

$\beta x \equiv E[ y | x ]$ ,

that is, $\beta$ is by definition the gradient of $E[y|x]$. In the smoking and health example, $\beta$ is by definition how much health changes on average as we consider a person who smokes one more cigarette per day (specifically without the caveat, “other things being equal”).

This interpretation of this model is merely “agnostic” or “predictive.” An insurance agency, for example, might be interested in estimating $\beta$ under this interpretation: the answer might help them understand how their payouts will vary if they accept customers who smoke more. But econometricians and other scientists are only rarely interested in such a predictive relationship. Instead, we want to know the causal effect of smoking on health, and the predictive regression generally does not recover that causal effect. Suppose for example we lived in a universe in which a given person’s health is unaffected by their smoking, but also that behaviors and characteristics which lead to low health also tend to lead to more smoking. Then we would tend to estimate negative values for $\beta$ even though by assumption (in whatever universe we’re discussing) smoking does not cause any person’s health.

For this reason econometricians rarely interpret the error term as simply the deviation between the outcome and its conditional expectation. Rather, in a structural interpretation of the equation, $\beta$ takes a causal interpretation and u is interpreted as summarizing all causes of y other than x. It is well-known that any of: (1) “reverse” causation, (2) omitted variables correlated with the regressors, or (3) measurement error in the regressors, lead to correlation between u and x, which in turn means that the parameter $\beta$ is not defined as the derivative of $E[y|x]$ with respect to $x$. We would like to know how a randomly selected person’s health changes if we could intervene and exogenously flip smoking status; the problem is that the correlation between smoking and health calculated from observational data does not generally give us any answer to that question.

Textbook discussion of the issue.

The seemingly straightforward issue is not straightforward at all, and exactly what we mean by “causal,” even in the context of simple regression models such as above, is a subject of ongoing multidisciplinary research. Nonetheless, since inferring causal relations from observational data is the defining characteristic of econometric analysis, it seems very reasonable to require that econometrics textbooks should contain lucid discussions of causal relationships and, in so doing, define parameters clearly and unambiguously. Disturbingly, Chen and Pearl find that six popular econometrics textbooks fail, to a greater or lesser extent, to do so.

Chen and Pearl evaluate texts on 10 criteria, which amount to: does the textbook provide as least as much information about causal interpretation as this post does very briefly above, is the text consistent on those interpretations, and does the text provide the equivalent of Pearl’s “do(x)” operator to define causal effects? Other than the “do(x)” criterion, which I don’t think is fair because Pearl’s concept has not caught on the econometrics literature and (even it ought to catch on) should therefore not (yet?) appear in current econometrics textbooks, the criteria seem very fair to me. Pity the poor student who attempts to understand how to interpret a structural econometric model after reading this startling passage in Kennedy, for example:

Using the dictionary meaning of causality, it is impossible to test for causality. Granger developed a special definition of causality which econometricians use in place of the dictionary definition: strictly speaking, econometricians should say “Granger-cause” in place of “cause,” but usually they do not. A variable x is said to Granger-cause y if prediction of the current value of y is enhanced by using past values of x.

This is the only passage in the book in which the word “causality” is used, and the claims in that passage are not correct, in no small part because so-called Granger causality is not a causal concept. Although in my view that passage is by far the worst discussion in the six texts discussed, Chen and Pearl show persuasively that each of the discussed textbooks are at times at least vague in their discussion of causal relations. On the other hand, Chen and Pearl are perhaps somewhat uncharitable in some of their discussion. For example, they make much of this passage from Greene,

[ In the model $earnings = X\beta + \delta C + \epsilon$ ] does $\delta$ measure the value of a college education (assuming the rest of the regression model is correctly specified)? The answer is no if the typical individual who chooses to go to college would have relatively high earnings whether or not he or she went to college…

but in context this appears to be a typo: the passage is rescued if “the OLS estimate of” is inserted in front of $\delta$, and the passage makes no sense if that or an equivalent edit is not made, and Greene in many, many other places clearly differentiates between mere correlations and causal parameters. Chen and Pearl, however, are not satisfied with an answer Greene gave them in a a personal communication as to the meaning of a structural parameter:

In a personal correspondence (2012), Greene wrote, “The precise de finition of e ffect of what on what is
subject to interpretation and some ambiguity depending on the setting. I find that model coefficients are
query about exactly, precisely carved in stone, what $\beta$ should be.”

I tentatively side with Greene here, although Chen and Pearl do not specify exactly what question Greene was asked. In structural models, the structural parameters are not necessarily causal effects in and of themselves, they are rather assumed to be invariant with respect to some well-specified class of disturbance. For example, the deep parameters characterizing Harold Zurcher’s replacement of bus engines are not themselves causal effects, but given estimates of those parameters, the model can answer meaningful causal questions. Exactly what a structural coefficient means is model-dependent.

Some results from other textbooks.

Without going into nearly as much detail as Chen and Pearl, I took a look through some other econometrics textbooks to check to see how they discuss, or do not discuss, causality. Specifically, I looked to see whether the regression parameters are anywhere incorrectly defined as gradients of the conditional expectation of the dependent variable, and I tried to find explicit discussions of causal interpretation of estimated models. The texts surveyed below vary widely in level and vintage, including everything from introductory undergraduate to advanced graduate texts, from 1985 through 2005.

This textbook is now old, well, ancient, by academic standards, and is relatively technically demanding. Opens, on page 1, by dubiously asserting that the goal of econometrics is to estimate parameters which define the joint distribution of a set of random variables $\{y, X\}$. As far as I can tell, the word “causal” does not appear anywhere, nor are there examples of predictive vs causal interpretation of parameters. Any notions of causality are implicit and framed in purely statistical terms. However, does not incorrectly defines $\beta$ as the gradient of $E[y|x]$.

Kmenta (1986), Elements of Econometrics

Does not incorrectly define $\beta$ as the gradient of $E[y|x]$.

There is a fairly long, yet confusing discussion of causality at the start of the chapter on simultaneous systems.

Although the concepts of causality and exogeneity are not identical, it is nevertheless possible to conclude that if a variable Y is–in some sense–caused by a variable X, Y cannot be considered exogenous in a system in which X also occurs. A widely discussed definition of causality has been proposed by Granger.

This is the textbook that I learned undergraduate econometrics from. I don’t remember how I thought of causality in econometric models at the time (possibly because I really didn’t like econometrics as an undergraduate). But it’s hard to see how a student could make much headway in understanding causality from that passage. Causality is first introduced “in some sense” deliberately avoiding a definition. An incorrect claim that if one variable causes another they cannot both be treated as exogenous in a system follows: that is simply not true, nothing in regression models precludes causal relationships between exogenous variables (as a trivial example, the square of an exogenous covariate is routinely used to capture nonlinear relationships between variables, which is deterministic and monocausal relationship). And then the notion of Granger-causality is introduced as the only formally defined causal concept in econometrics.

Davidson and MacKinnon (1993), Estimation and Inference in Econometrics

The parameters in the linear regression model are defined in Chapter 1 very abstractly as the set of real numbers defining the subspace spanned by the column vectors of the regressors. $\beta$ is never incorrectly defined as the gradient of $E(y|x)$. Simultaneity and omitted variable bias are discussed in purely statistical, as opposed to causal, terms in Chapter 7.

Discusses causality explicitly in section 18.2, “Exogeneity and causality.” The clearest passage is,

But we have not yet discussed the conditions under which one can validly treat a variable as explanatory. This includes the use of such variables as regressors in least squares estimation and as instruments in instrumental variables or GMM estimation. For conditional inference to be valid, the explanatory variables much be predetermined or exogenous in one or other a variety of senses to be defined below.

which is not very clear at all: the authors intend, I think, the first sentence to mean, “But we have not yet discussed the conditions under which one can treat the coefficient on a variable as reflecting a causal effect.” The matter is then further muddied as later in this subsection the concept of Granger causality is introduced, without clearly differentiating between so-called Granger-causality and causality.

There is an implicit discussion of causality when estimation of supply and demand functions is introduced as an issue to motivate instrumental variable estimation: if we remember from theory that the slopes of these functions are indeed causal effects, then the discussion amounts to asserting that OLS does not recover causal effects in this context.

Gujarati (1999), Essentials of Econometrics, second edition.

Does not incorrectly define $\beta$ as the gradient of $E[y|x]$.

Implicitly defines regression parameters as causal effects (without using the word “causal”) on page 7. On page 8, correctly defines the error term as unobserved causes of the dependent variable, and notes,

Before proceeding further, a warning regarding causation is in order…. Does regression imply causation? Not necessarily. As Kendall and Stuart note, “A statistical relationship, however strong and however suggestive, can never establish causal connection: our ideas of causation must come from outside statistics, ultimately from some theory or other.”

A variant of this warning is repeated on page 124, although somewhat oddly then proceeds to give uses for regression analysis which do not include the estimation of causal effects.

Gives examples of omitted variables bias and simultaneity bias which implicitly define the structural parameters as causal effects, and refers again to these parameters when introducing instrumental variables, a topic not pursued in this introductory-level text.

Hayashi (2000), Econometrics.

Defines regression parameters as causal effects (without using the word “causal”) on page 4, but also claims on the same page that an econometric model is a “set of joint distributions satisfying a set of assumptions,” which leaves it unclear whether the author intends regression parameters to reflect causal effects or parameters defining statistical distributions.

Introduces the issue of endogeneity noting that, “The most important assumption made for the OLS [sic] is the orthogonality between the error term and the regressors. Without it, the OLS estimator is not even consistent.” Much like Davidson and MacKinnon (1993), differentiates between causation and mere correlation using estimation of the slopes of supply and demand curves as an example, albeit without using any variant of the word, “cause.”

Wooldridge (2002), Econometric Analysis of Cross-Section and Panel Data.

Chen and Pearl discuss “baby” Wooldridge, the undergrad text. Does Papa Wooldridge fare better?

The opening passage of the text, Section 1.1 of the Introduction, begins,

The goal of most empirical studies in economics and other social sciences is to determine whether a change in one variable, say w, causes a change in another variable, say y…. Because economic variables are properly interpreted as random variables, we should use ideas from probability to formalize the sense in which change in w causes a change in y. The notion of ceteris paribus… is the crux of establishing a causal relationship. Simply finding that two variables are correlated is rarely enough….”

Goes on to define regression parameters as partial derivatives of conditional expectations, although not of $E[y|X]$ but of (in our notation) of $E[y|X, u]$.

Includes the first, to the best of my knowledge, lengthy discussion of the counterfactuals/treatment effects literature (Chapter 18), and links the preceding discussion of regression models to the treatment effects literature.

Davidson and MacKinnon (2004), Econometric Theory and Methods.

We can make a fixed-effects type observation here, as we have the another text from James and Russell, about a decade later than the 1993 text discussed above. How do the 1993 and 2004 books differ? The introductory passage on page 1 introduces regression parameters and implies their definition depends on how the error term is defined, although at this point exactly what $\beta$ means is deliberately left vague, it’s interpretation is “quite arbitrary,” the authors correctly note. After introducing the equivalent of the model $y=\beta x +u$, the text states (in our notation),

At this stage we should note that, as long as we say nothing about the unobserved quantity $u$, [the equation] does not tell us anything. In fact, we can allow $\beta$ to be quite arbitrary, since for any given [value] the model… can always be be made to be true by defining $u$ suitably.

A similar passage on page 313 notes that, when a regressor is measured with error, OLS estimation gives the desired result if the error term is defined as simply the difference between the observed outcome and its expectation with respect to the observed regressor, but “in most cases” in econometrics that definition does allow us to estimate the parameters we wish to estimate.

More or less the same discussion of supply and demand as in the 1993 text can again be interpreted as an implicit discussion of causality.

Dielman (2005), Applied Regression Analysis, 4th ed.

Incorrectly defines $\beta$ as the slope of $E[y| x]$ on page 75, although in the context of a model explicitly described as a “descriptive regression.” Does not immediately clarify, however, when a regression model should be interpreted as merely descriptive.

Discusses “causal” versus “extrapolative” regression models in the narrow context of time series modeling on page 112, but does not make it clear what the intended difference between these concepts is, nor is it clear why this discussion is limited to time series models. Claims that the issue with causal models is, “causal models require the identification of variables that are related to the dependent variable in a causal manner. Then data must be gathered on these explanatory variables to use the model.” This makes it seem that simple correlations can be used to infer causal relations so long as we can observe both the variables. However, also notes on page 118 that “A common mistake made when using regression analysis is to assume that a strong fit (a high $R^2$) of a regression of y on x automatically means `x causes y.'” There is then a brief discussion of endogeneity through simultaneity and through omitted variables, which is quite clear, particularly for an introductory text.

Cameron and Trivedi (2005), Microeconometrics: Methods and Applications.

A few sentences into the introduction on page 1, notes that,

A key distinction in econometrics is between essentially descriptive models and data summaries at various levels of statistical sophistication and models that go beyond mere associations and attempt to estimate causal parameters. The classic definitions of causality in econometrics derive from the Cowles Commission simultaneous equations model that draw sharp distinctions between exogenous and endogenous variables, and between structural and reduced form parameters. Although reduced form models are useful for some purposes, knowledge of structural or causal parameters is essential for policy analysis.

This focus on causal parameters is maintained throughout. Chapter 2 is titled “Causal and noncausal models,” and provides a quite high-level formal discussion of causality in the context of both classical simultaneous models, and introduces topics in causal modeling which will be covered through the remainder of the book, including the Rubin Causal Model and a variety of methods researchers use to identify causal parameters. Given this emphasis, it is unsurprising that regression parameters are not incorrectly defined as the gradient of $E[y|x]$. Discusses counterfactual modeling in Chapter 25, “Treatment Evaluation,” at length, linking the methods in this literature to previous discussions of single-equation regression, matching, instrumental variables, and regression discontinuity designs.

Remarks.

The additional textbooks briefly surveyed suffer to a greater or lesser extent from weak discussions of causality as the texts surveyed by Chen and Pearl, with the exceptions of Wooldridge (2002) and particularly Cameron and Trivedi (2005), which I think would only fail Chen and Pearl’s criterion that the equivalent of the “do(x)” concept should be included (and arguably, an equivalent is included).

There is something of a puzzle here in that the oral tradition in applied econometrics heavily emphasizes causation, but it would seem that relatively few textbooks explicitly discuss the matter. In journal articles, seminars, and economics classrooms, there is consensus that the goal of econometric analysis is almost always to estimate a model which can answer causal questions. Overcoming the various serious challenges that arise in making such attempts is the core of most papers in applied econometrics, and how successful a paper is in achieving that goal is the target of sharp-eyed readers and referees. What explains the discrepancy between how economists think about causation and what appears in most econometrics textbooks?

First, econometrics textbooks tend to be authored by theoretical econometricians, who tend to be situated much closer to the interface between statistics and econometrics than applied researchers. Since statisticians do not tend to think in terms of causality, perhaps some of that statistical tradition makes its way over to econometrics textbooks.

Second, statistical concepts which in the context of applied econometrics refer to causal concepts are nonetheless presented as statistical concepts in econometrics textbooks, but it is understood that the underlying objects of inference are still causal. A “biased estimate of $\beta$” is a purely statistical concept, but if a referee or seminar attendee were to use that phrase they almost certainly mean, “the estimate you present is not a good estimate of the causal effect in which we are interested.” Similarly, a remark like, “your data doesn’t credibly identify $\beta$” appears to be a claim about a purely statistical matter, but the person making that claim almost certainly means, “the causal parameter we would like to estimate is hopelessly confounded, given the data we have and the model you’ve developed.” Further to this point, I note that way back in the old-timey days of the 1990s, I took a sequence of econometrics courses from MacKinnon and Davidson based on their 1993 textbook. Even though this text does not include a good discussion of causality using that term, and it is notably lacking in applied examples, it was always very clear to me (and, I think, my classmates) that we are ultimately interested in estimating models which allow us to make causal inferences, as opposed to merely characterizing the joint distribution of some set of variables.

Third, the language of counterfactuals in which the literature on causation is currently being developed is a relatively recent development. As noted above, Wooldridge (2002) is, to the best of my knowledge, the first econometrics textbook to include an extended discussion written in this language. What amounts to the same concepts were previously, as in the examples in previous point, discussed using language borrowed from statistics. The slightly more recent text by Cameron and Trivedi (2005) is substantially more oriented towards causal modeling than any of the other texts, and also includes lengthy discussion of the recent literature on modeling heterogeneous causal effects. My impression from reading Chen and Pearl and flipping through the texts above is the textbooks tend to be getting better over time in terms of discussing causation, presumably in part because these ideas are permeating the applied econometrics literature. Notably, the oldest textbooks discussed above (Amemiya 1985 and Kmenta 1986) present the vaguest discussions of causal concepts.

The oral tradition in economics is not well-reflected in current, or particularly in outdated, textbooks. Chen and Pearl do those of us who teach or study econometrics a service in highlighting this problem, and hopefully discussion in future textbooks will continue to improve.