*将这篇文章转发到朋友圈,然后获得20个点赞数,截图发到计量经济圈公邮,然后我们将会赠送一本《Stata十八讲》的电子书。 This post was written jointly with David Drukker, Director of Econometrics, StataCorp. In our last post, we introduced the concept of treatment effects and demonstrated four of the treatment-effects estimators that were introduced in Stata 13. Today, we will talk about two more treatment-effects estimators that use matching. Introduction Last time, we introduced four estimators for estimating the average treatment effect (ATE) from observational data. Each of these estimators has a different way of solving the missing-data problem that arises because we observe only the potential outcome for the treatment level received. Today, we introduce estimators for the ATE that solve the missing-data problem by matching. Matching pairs the observed outcome of a person in one treatment group with the outcome of the “closest” person in the other treatment group. The outcome of the closest person is used as a prediction for the missing potential outcome. The average difference between the observed outcome and the predicted outcome estimates the ATE. What we mean by “closest” depends on our data. Matching subjects based on a single binary variable, such as sex, is simple: males are paired with males and females are paired with females. Matching on two categorical variables, such as sex and race, isn’t much more difficult. Matching on continuous variables, such as age or weight, can be trickier because of the sparsity of the data. It is unlikely that there are two 45-year-old white males who weigh 193 pounds in a sample. It is even less likely that one of those men self-selected into the treated group and the other self-selected into the untreated group. So, in such cases, we match subjects who have approximately the same weight and approximately the same age. This example illustrates two points. First, there is a cost to matching on continuous covariates; the inability to find good matches with more than one continuous covariate causes large-sample bias in our estimator because our matches become increasingly poor. Second, we must specify a measure of similarity. When matching directly on the covariates, distance measures are used and the nearest neighbor selected. An alternative is to match on an estimated probability of treatment, known as the propensity score. Before we discuss estimators for observational data, we note that matching is sometimes used in experimental data to define pairs, with the treatment subsequently randomly assigned within each pair. This use of matching is related but distinct. Nearest-neighbor matching Nearest-neighbor matching (NNM) uses distance between covariate patterns to define “closest”. There are many ways to define the distance between two covariate patterns. We could use squared differences as a distance measure, but this measure ignores problems with scale and covariance. Weighting the differences by the inverse of the sample covariance matrix handles these issues. Other measures are also used, but these details are less important than the costs and benefits of NNM dropping the functional-form assumptions (linear, logit, probit, etc.) used in the estimators discussed last time. Dropping the functional-form assumptions makes the NNM estimator much more flexible; it estimates the ATE for a much wider class of models. The cost of this flexibility is that the NNM estimator requires much more data and the amount of data it needs grows with each additional continuous covariate. In the previous blog entry, we used an example of mother’s smoking status on birthweight. Let’s reconsider that example. . webuse cattaneo2.dta, clear Now, we use teffects nnmatch to estimate the ATE by NNM. . teffects nnmatch (bweight mmarried mage fage medu prenatal1) (mbsmoke) The estimated ATE is -211, meaning that infants would weigh 211 grams less when all mothers smoked than when no mothers smoked. The output also indicates that ties in distance caused at least one observation to be matched with 16 other observations, even though we requested only matching. NNM averages the outcomes of all the tied-in-distance observations, as it should. (They are all equally good and using all of them will reduce bias.) NNM on discrete covariates does not guarantee exact matching. For example, some married women could be matched with single women. We probably prefer exact matching on discrete covariates, which we do now. . teffects nnmatch (bweight mmarried mage fage medu prenatal1) (mbsmoke), /// Exact matching on mmarried and prenatal1 changed the results a little bit. Using more than one continuous covariate introduces large-sample bias, and we have three. The option biasadj() uses a linear model to remove the large-sample bias, as suggested by Abadie and Imbens (2006, 2011). . teffects nnmatch (bweight mmarried mage fage medu prenatal1) (mbsmoke), /// In this case, the results changed by a small amount. In general, they can change a lot, and the amount increases with the number of continuous covariates. Propensity-score matching NNM uses bias adjustment to remove the bias caused by matching on more than one continuous covariate. The generality of this approach makes it very appealing, but it can be difficult to think about issues of fit and model specification. Propensity-score matching (PSM) matches on an estimated probability of treatment known as the propensity score. There is no need for bias adjustment because we match on only one continuous covariate. PSM has the added benefit that we can use all the standard methods for checking the fit of binary regression models prior to matching. We estimate the ATE by PSM using teffects psmatch. . teffects psmatch (bweight) (mbsmoke mmarried mage fage medu prenatal1 ) The estimated ATE is now -229, larger in magnitude than the NNM estimates but not significantly so. How to choose among the six estimators We now have six estimators:
The ATEs we estimated are
Which estimator should we use? We would never suggest searching the above table for the result that most closely fits your wishes and biases. The choice of estimator needs to be made beforehand. So, how do we choose? Here are some rules of thumb:
Final thoughts Before we go, we reiterate the cautionary note from our last entry. Nothing about the mathematics of treatment-effects estimators magically extracts causal relationships from observational data. We cannot thoughtlessly analyze our data using Stata’s teffects commands and infer a causal relationship. The models must be supported by scientific theory. If you would like to learn more about treatment effects in Stata, there is an entire manual devoted to the treatment-effects features in Stata 14; it includes a basic introduction, an advanced introduction, and many worked examples. In Stata, type help teffects: . help teffects Title [TE] teffects—Treatment-effects estimation for observational data Syntax … <output omitted> … The title [TE] teffects will be in blue, which means it’s clickable. Click on it to go to the Treatment-Effects Reference Manual. Or download the manual from our website; visit http://www./manuals14/te/ References Abadie, A., and Imbens, G. W. 2006. Large sample properties of matching estimators for average treatment effects. Econometrica 74: 235–267. Abadie, A., and Imbens, G. W. 2011. Bias-corrected matching estimators for average treatment effects. Journal of Business and Economic Statistics 29: 1–11. Cattaneo, M. D. 2010. Efficient semiparametric estimation of multi-valued treatment effects under ignorability. Journal of Econometrics 155: 138–154. |
|