分享

倾向得分匹配、双重差分倾向得分匹配(PSM、PSM

 HUSTKP 2020-05-22

引自:https://www.bilibili.com/read/cv2545056/

2020年1月3日 第一次修订纪念


outline

一、缘起 为什么要选择 PSM

、PSM :selection bias 的有效工具

三、计算p-score,进行PSM匹配的方法与步骤(Matching methods explained)

  • Step 1: Estimating a Model of Program Participation

  • Step 2: Defining the Region of Common Support and Balancing Tests

  • Step 3: Matching Participants to Nonparticipants

  • Step 4 Calculating the Average Treatment Impact

  • 进行PSM匹配的步骤总结

  • PSM的局限

、DID-PSM----panel data 的 PSM 

一、缘起 为什么要选择 PSM

在公共政策研究中,我们常希望评估某项政策实施后的效应,这样的研究也称之为政策评估(policy evaluation),采取政策的地市也称之为“处理效应”(treatment effect)。采取政策的全体构成“实验组”或“处理组”(treatment group),而没有采取该政策的全体称之为“对照组”(control group)。

显然,像我们PA这样的社会科学,很难搞随机分组实验,我们更多的是靠观察和准实验研究。但是由于选择性偏差(selection bias)“反事实框架”(a counterfactual framework)的存在,导致我们直接评估政策效果可能存在一定偏误。

何为selection bias?

由于实验组和对照组的初始条件不完全相同造成的选择性偏差(selection bias),而由于个体通常会根据其参加的项目预期收益而自我选择是否参加项目,导致对平均处理效应的估计带来困难,这也被称为选择难题(the selection problem)。

何为Counterfactual?

Rubin(1974)提出了“反事实框架”(a counterfactual framework)。

The main challenge of an impact evaluation is to determine what would have happened to the beneficiaries if the program had not existed. That is, one has to determine the per capita household income of benefi ciaries in the absence of the intervention. A beneficiary’s outcome in the absence of the intervention would be its counterfactual.(world bank,P22)

如何解决selection bias以及counterfactual?

选择难题并非不可克服,解决方法之一就是匹配估计量

匹配的核心思想是运用统计学技巧人为地构造出一个对照组,通过那些可观测特征(observable characteristics)试图为每个参与者(treated)“搭配”一个未参与者(untreated)。换句话说, 对于可观测的变量,通过匹配构造出的对照组(control group)与参与组(treatment group)拥有相同的随机分布。

假设个体i属于处理组,匹配估计量的基本思路是,找到属于控制组的某个体j,使得个体j与个体i的可测变量取值尽可能相似。基于可忽略性假设,个体i和个体j进入处理组的概率相近,具有可比性。类似的,对处理组的每位个体都进行匹配,对控制组每位个体也进行匹配,然后对每位个体的处理效应平均,即得到“匹配估计量”(matching estimators)。

If one assumes that differences in participation are based solely on differences in observed characteristics, and if enough nonparticipants are available to match with participants, the corresponding treatment effect can be measured even if treatment is not random

为什么采用PSM进行匹配?

更一般的,xi可能包括多个变量,比如xi为K维向量。此时如果对xi进行匹配,则意味着要在高维度空间进行匹配,可能遇到数据稀疏的问题,即很难找到与xi相近的xj与之匹配。为此,一般使用某函数f(xi),将K维向量xi的信息压缩到一维,进而根据f(xi)进行匹配。方法之一就是使用向量范数(vector norm),即在向量空间(vector space)定义的距离函数。

考虑xi与xj之间的相似度或距离,定义为“马氏距离”(Mahalanobis distance),使用马氏距离进行匹配,称之为“马氏匹配”(Mahalanobis matching)。有时也使用对角元素为各变量方差的对角矩阵之逆矩阵作为权重矩阵。通过协变量的某个距离函数进行匹配,称之为“协变量匹配”(covariate matching)。

马氏匹配的缺点是:如果x包含的变量较多或者样本容量不够大,则找不到较好的匹配。比如尽管个体i和个体j的相对马氏距离很近,但绝对马氏距离依然很远。为此,统计学家Rosenbaum and Rubin(1983)提出使用“倾向得分”(propensity score, P-score)来度量距离

二、PSM :解决selection bias 的有效工具

倾向得分匹配

Propensity score matching (PSM) constructs a statistical comparison group that is based on a model of the probability of participating in the treatment, using observed characteristics. 

In PSM, each participant is matched to a nonparticipant on the basis of a single propensity score, reflecting the probability of participating conditional on their different observed characteristics X (see Rosenbaum and Rubin 1983). PSM therefore avoids the “curse of dimensionality” associated with trying to match participants and nonparticipants on every possible characteristic when X is very large.

白小圭的点评:对于“curse of dimensionality” ,白小圭还真是深有感触,大前年的时候,我想利用一个多维的截面数据,想采用拉丁方设计检验不同样本的效率的时候,就是因为维数灾难问题导致我始终无法匹配成功。妈呀,看见PSM的思想,顿时就爱上了~~~

定义:个体i 的倾向得分为,在给定xi的情况下,个体i进入处理组的条件概率,p(xi)。

在使用样本数据估计p(x)的时候,可使用参数估计(probit or logit)或非参数估计,最流行的方式是logit。使用p-score来度量个体之间的好处是,它不仅是一维变量,而且取值介于[0,1]之间。

使用p-score作为距离函数进行匹配称之为“倾向得分匹配”(propensity score matching, PSM)

PSM有效的前提(Assumptions)

PSM有效的前提:The validity of PSM depends on two or three conditions: (a) conditional independence (namely, that unobserved factors do not affect participation) ; (b) sizable common support or overlap in propensity scores across the participant and nonparticipant samples and (c) banlancing condition

(a) conditional independence

Conditional independence states that given a set of observable covariates X that are not affected by treatment, potential outcomes Y are independent of treatment assignment . If Yi D represent outcomes for participants and Yi C outcomes for nonparticipants, conditional independence implies.

条件独立假定也称为可忽略性假定(ignorability)

treatment effect严格外生,不存在内生性问题。

  • For random experiments, the outcomes are independent of treatment. y0,y1⊥D,The treatment variable needs to be exogenous.(随机实验,实验处理效应严格外生。是否分配到实验组或控制组不会对Y产生影响。)

  • For observational studies, the outcomes are independent of treatment, conditional on x. y0,y1⊥D|x。We need treatment assignment that ignores the outcomes.(观察实验,如准实验设计。给定x的情况下,实验处理效应严格外生。)

This assumption is also called unconfoundedness (无混淆性,Rosenbaum and Rubin 1983), and it implies that uptake of the program is based entirely on observed characteristics. To estimate the treatment effect on the treated (TOT) as opposed to the average treatment effect(ATE) ,a weaker assumption is needed。

  • Conditional independence of the control group outcome and treatment.

  • Weaker assumption than the conditional independence assumption. y0⊥D|x

“条件独立假定”是一个很强的假定,这意味着回归方程包含了所有变量,不存在遗漏变量。然而,我们并不清楚xi是否会以非线性形式进入方程。Conditional independence is a strong assumption and is not a directly testable criterion; it depends on specific features of the program itself.  If unobserved characteristics determine program participation, conditional independence will be violated, and PSM is not an appropriate method

如果违背了conditional independence的假设应该怎么办?

各种匹配估计量均依赖于可忽略性假定,依可测变量选择;故不适用于依不可测变量选择的情形。对于观测数据,如果怀疑存在依不可测变量选择的情形,有如下几种处理办法

(1)使用尽可能多的相关可测变量。(如果xi中包含比较丰富的协变量,a rich set of covariates, 则可认为可忽略性得到满足)

(2)如果处理变量Di的不可观测变量不随时间而变,而且有panel data,使用DID-PSM

(3)使用断点回归法,特别是模糊断点回归。

(4)使用IV估计

(5)依据可测变量选择的影响来估计不可测变量的影响

On its own, PSM is a useful approach when only observed characteristics are believed to affect program participation. Whether this belief is actually the case depends on the unique features of the program itself, in terms of targeting as well as individual takeup of the program. Assuming selection on observed characteristics is sufficiently strong to determine program participation, baseline data on a wide range of preprogram characteristics will allow the probability of participation based on observed characteristics to be specified more precisely. Some tests can be conducted to assess the degree of selection bias or participation on unobserved characteristics.

(b) 重叠假定 sizable common support or overlap

For each value of x, there are both treated and control observations.For each treated observation, there is a matched control observation with similar x.这个假定意味着处理组和控制组这两个子样本存在重叠;另外,它又是进行匹配的前提,故也称“匹配假定”。保证了处理组和控制组的p-score取值范围有相同的部分(common support)。

重叠假定:对于x的任何取值,都有0<p(x)<1。

This condition ensures that treatment observations have comparison observations “nearby” in the propensity score distribution (Heckman, LaLonde, and Smith 1999). Specifically, the effectiveness of PSM also depends on having a large and roughly equal number of participant and nonparticipant observations so that a substantial region of common support can be found. For estimating the TOT, this assumption can be relaxed to P (Ti = 1|Xi) < 1.    

there is overlap between p-score of participants and nonparticipants

overlap

在进行匹配时,为了提高匹配质量,通常只保留p-score重叠的个体(尽管会损失样本)。如果倾向得分的共同取值范围太小,会导致偏差。

Bias may also result from dropping nonparticipant observations that are systematically different from those retained; this problem can also be alleviated by collecting data on a large sample of nonparticipants, with enough variation to allow a representative sample. Otherwise, examining the characteristics of the dropped nonparticipant sample can refine the interpretation of the treatment effect.

和刚才的条件独立假定不同的是,common support是进行匹配的前提,没有改进的方法。如果common support过小,则说明你的数据不适合做匹配。只能放弃。

(c) Balancing condition

balancing condition 其实也算是common support的一部分。是为了解决在p-score不重叠部分被删除带来 possible sampling bias 。

  • Assignment to treatment is independent of the x characteristics, given the same propensity score.D⊥x|P(X)

  • The balancing condition is testable.

Treatment units will therefore have to be similar to nontreatment units in terms of observed characteristics unaffected by participation; thus, some nontreatment units may have to be dropped to ensure comparability. However, sometimes a nonrandom subset of the treatment sample may have to be dropped if similar comparison units do not exist (Ravallion 2008). This situation is more problematic because it creates a possible sampling bias in the treatment effect. Examining the characteristics of dropped units may be useful in interpreting potential bias in the estimated treatment effects

Heckman, Ichimura, and Todd (1997) encourage dropping treatment observations with weak common support. Only in the area of common support can inferences be made about causality, as reflected in Figure 4.2 reflects a scenario where the common support is weak.

common support is weak

三、计算p-score,进行PSM匹配的方法与步骤(Matching methods explained)

 To calculate the program treatment effect, one must first calculate the propensity score P(X) on the basis all observed covariates X that jointly affect participation and the outcome of interest. The aim of matching is to find the closest comparison group from a sample of nonparticipants to the sample of program participants. “Closest” is measured in terms of observable characteristics not affected by program participation. 

Step 1: Estimating a Model of Program Participation

When one is interested only in comparing outcomes for those participating (T = 1) with those not participating (T = 0), this estimate can be constructed from a probit or logit model of program participation. Caliendo and Kopeinig (2008) also provide examples of estimations of the participation equation with a nonbinary treatment variable, based on work by Bryson, Dorsett, and Purdon (2002); Imbens (2000); and Lechner (2001). In this situation, one can use a multinomial probit (which is computationally intensive but based on weaker assumptions than the multinomial logit) or a series of binomial models.

The predicted outcome represents the estimated probability of participation or propensity score. Every sampled participant and nonparticipant will have an estimated propensity score, Pˆ(X |T = 1) = Pˆ(X). Note that the participation equation is not a determinants model, so estimation outputs such as t-statistics and the adjusted R2 are not very informative and may be misleading. For this stage of PSM, causality is not of as much interest as the correlation of X with T.

Nevertheless, including too many X variables in the participation equation should also be avoided; overspecifi cation of the model can result in higher standard errors for the estimated propensity score ˆP(X ) and may also result in perfectly predicting participation for many households ( ˆP(X ) = 1)。注意x变量个数不要太多。

通过logit模型进行倾向打分

命令:pscore treat $x,pscore(mypscore) blockid(myblock) comsup numblo(5) level(0.05) logit

Step 2: Defining the Region of Common Support and Balancing Tests

Next, the region of common support needs to be defined where distributions of the propensity score for treatment and comparison group overlap. As mentioned earlier, some of the nonparticipant observations may have to be dropped because they fall outside the common support. Sampling bias may still occur, however, if the dropped nonparticipant observations are systematically different in terms of observed characteristics from the retained nonparticipant sample; these differences should be monitored carefully to help interpret the treatment effect. 

Balancing tests can also be conducted to check whether, within each quantile of the propensity score distribution, the average propensity score and mean of X are the same. For PSM to work, the treatment and comparison groups must be balanced in that similar propensity scores are based on similar observed X. Although a treated group and its matched nontreated comparator might have the same propensity scores, they are not necessarily observationally similar if misspecification exists in the participation equation. The distributions of the treated group and the comparator must be similar, which is what balance implies. Formally, one needs to check if ˆP(X |T = 1) = ˆP(X |T = 0).

Matching with common support

Restrict matching only based on the common range of propensity scores

Matching with common support

Step 3: Matching Participants to Nonparticipants

Different approaches are used to match participants and nonparticipants on the basis of the propensity score. They include nearest-neighbor (NN) matching, caliper and radius matching, stratification and interval matching, and kernel matching and local linear matching (LLM). Regression-based methods on the sample of participants and nonparticipants, using the propensity score as weights, can lead to more efficient estimates.

Propensity scores for treated and control groups

p-score of treated & control group

Matching methods: for each treated observation i, we need to find matches of control observation(s) j with similar characteristics.

Kernel matching

Each treated observation i is matched with several control observations, with weights inversely proportional to the distance between treated and control observations.(核匹配是构造一个虚拟对象来匹配处理组,构造的原则是对现有的控制变量做权重平均,权重的取值与处理组、控制组PS值差距呈反向相关关系。)

 With matching based on propensity scores, the weights are defined as:

 weights of Kernel matching

Here h is the bandwidth parameter.

kernal matching

nearest neighbor matching 

For each treated observation i, select a control observation j that has the closest x.

nearest neighbor matching

含义:最邻近匹配法是最常用的一种匹配方法,它把控制组中找到的与处理组个体倾向得分差异最小的个体,作为自己的比较对象 。

优点:按处理个体找控制个体,所有处理个体都会配对成功,处理组的信息得以充分使用。

缺点:由于不舍弃任何一个处理组,很可能有些配对组的倾向得分差距很大,也将其配对,导致配对质量不高,而处理效应ATT的结果中也会包含这一差距,使得ATT精确度下降。

nearest neighbor matching

Radius matching

Each treated observation i is matched with control observations j that fall within a specified

radius.(半径匹配法是事先设定半径,找到所有设定半径范围内的单位圆中的控制样本,半径取值为正。随着半径的降低,匹配的要求越来越严。)

Radius matching

Stratification or interval matching

Compare the outcomes within intervals/blocks of propensity scores

内容:分层匹配法是根据估计的倾向得分将全部样本分块,使得每块的平均倾向得分在处理组和控制组中相等。

优点:Cochrane ,Chambers(1965)指出五个区就可以消除95%的与协变量相关的偏差。这个方法考虑到了样本的分层问题或聚类问题。就是假定:每一层内的个体样本具有相关性,而各层之间的样本不具有相关性。

缺点:如果在每个区内找不到对照个体,那么这类个体的信息,会丢弃不用。总体配对的数量减少。

匹配的技术细节

  • 是否允许并列(ties):比如控制组个体j与k的可测变量都与处理组i一样接近。如果允许并列,则将yj与yk的平均值作为y0i的估计量。如果不允许并列,则根据数据排序选择个体i或者k,此时匹配结果可能与数据排序有关,故一般建议先将样本随机排序再进行匹配。

  • 是否放回(Matching with or without replacement):Matching without replacement (如果不放回)- each control observation is used no more than one time as a match for a treated observation.每次都将匹配成功的个体(i,j)从样本中去掉,不参与其余匹配;Matching with replacement(如果放回) – each control observation can be used as a match to several treated observations.则依然将匹配成功个体留在样本中,参与其余匹配。

  • 1-to-1匹配,也可以进行1-to-more匹配,即针对每位个体寻找四位不同组的最近个体进行匹配。一般来说,匹配估计量存在偏差(bias),除非“精确匹配”(exact matching)的情况下,即对于所有匹配都有xi=xj。更常见的是“非精确匹配”(inexact matching),即xi≈xj。

  • 在非精确匹配的情况下,如果进行one-2-one匹配,则偏差较小,则方差较大;而进行一对多匹配则可降低方差,但代价是偏差偏大。abadie et al(2004)建议进行1-to-4匹配,在一般情况下可最小化均方误差(MSE)。

Step 4 Calculating the Average Treatment Impact

Average treatment effect (ATE)

Average treatment effect (ATE)

A simple t-test between the outcomes for the treated and control groups.

ATE is fine for random experiments but in observational studies, it may be biased if treated and control observations are not similar.

Average treatment effect on the treated (ATET)

ATET is the difference between the outcomes of treated and the outcomes of the treated observations if they had not been treated.

Average treatment effect on the treated (ATET)

The second term is a counterfactual so it is not observable and needs to be estimated.

Propensity score method

After matching on propensity scores, we can compare the outcomes of treated and control observations.

Propensity score method to estimate ATET

Empirical estimation

Each treated observation i is matched j control observations and their outcomes y0 are weighed by w.

技巧

如果控制组个体并不多(N0较小),则应进行有放回的匹配

如果存在较多具有可比性的控制组个体,则考虑1-to-more匹配或者核匹配,以提高匹配效率。

进行PSM匹配的步骤总结:

1.Assign the observations into two groups: the treated group that received the treatment and the control group that did not.

  • Treatment D is a binary variable that determines if the observation has the treatment or not。(可以做dummy)

  • D=1 for treated observations and D=0 for control observations

2. Estimate a probit/logit model for the propensity of observations to be assigned into the treated group. Use x variables that may affect the likelihood of being assigned into the  treated group.

  • The propensity score model is a probit/logit model with D as the dependent variable and x as independent variables

p-score model

  • The propensity score is the conditional (predicted) probability of receiving treatment given pre-treatment characteristics x.

3. Match observations from treated and control groups based on their propensity scores

  • Several matching methods are available: kernel, nearest neighbor, radius, stratification

4.Calculate the treatment effects: compare the outcomes y between the treated and control

observations, after matching

caculate y

  • Counterfactual situation: compare the outcome of the treated observations with the outcome of the treated observations if they were not treated (find a close match using the control observations and use their outcome)

PSM的局限

①严重依赖于“条件独立假定”:

The main advantage (and drawback) of PSM relies on the degree to which observed characteristics drive program participation.If selection bias from unobserved characteristics is likely to be negligible, then PSM may provide a good comparison with randomized estimates. To the degree participation variables are incomplete, the PSM results can be suspect. This condition is, as mentioned earlier, not a directly testable criteria; it requires careful examination of the factors driving program participation (through surveys, for example). 

②违反重叠假定可能会造成实验效应低估:PSM is also a semiparametric method, imposing fewer constraints on the functional form of the treatment model, as well as fewer assumptions about the distribution of the error term. Although observations are dropped to achieve the common support, PSM increases the likelihood of sensible comparisons across treated and matched control units, potentially lowering bias in the program impact. This outcome is true, however, only if the common support is large; sufficient data on nonparticipants are essential in ensuring a large enough sample from which to draw matches. Bias may also result from dropping nonparticipant observations that are systematically different from those retained; this problem can also be alleviated by collecting data on a large sample of nonparticipants, with enough variation to allow a representative sample. Otherwise, examining the characteristics of the dropped nonparticipant sample can refi ne the interpretation of the treatment effect. 

所以一定要注意:

  • PSM通常要求比较大的样本容量提高匹配质量

  • 要求较大的common support

  • PSM只控制了可测变量的影响,如果存在selection on unobservable,仍会带来“隐性偏差”(hidden bias)

四、DID-PSM----panel data 的 PSM

双重差分DID 法是在假设条件满足(实验组与控制组的变化趋势相同)的条件下,通过差分的方法来解决内生性问题。但DID不能解决选择问题。

DID和PSM有天然的互补,就像DEA和tobit一样,天生注定是一对。

歪个楼,走个神,我们写一个三行情诗吧

《如果你是DID》

如果你是DID,那我一定是PSM
如果我是DEA,那你一定是Tobit
我们注定天生一对

DID-PSM由Heckman et al(1997,1998)提出DID-PSM的优点:可以控制unobservable 但是time-invariant变量的组间差异。比如处理组与控制组分别来自两个不同区域。或者处理组和控制组使用不同的调查问卷。具体而言:

  • First, the combination of the two methods is most robust and efficient in removing the biases due to covariates and in estimating the treatment effect on the treated (Abadie and Imbens 2006; Heckman, Ichimura, and Todd 1997; Rubin 1973, 1979). Matched sampling substantially reduces differences in observed covariates, and model-based adjustment can further control for residual differences.

  • Second, matched sampling relaxes the DID identification restrictions, making model-based adjustments less sensitive to the model specification.This reduced sensitivity again facilitates the estimation of parsimonious parametric approximations of the average treatment effect on the treated (Abadie 2005; Ho et al. 2007).

the propensity score can be used to match participant and control units in the base (preprogram) year, and the treatment impact is calculated across participant and matched control units within the common support.

The difference-in-differences model is applied when panel data on outcomes are available before (b) and after (a) the experiment occurs.

The difference-in-differences model is an improvement over the one-period model.

The difference-in-differences average treatment effect on the treated is specified as:

DID ATET

The first term refers to the differences in outcomes before and after the treatment for the treated group. This term may be biased if there are time trends. The second term uses the differences in outcomes for the control group to eliminate this bias.

To apply the difference-in-differences model: instead of the outcomes for the treated and control groups, we use the differences in outcomes after the treatment and before the  treatment.The rest of the analysis is the same.

步骤有

  • 对处理变量Di与协变量xi估计p-score

  • 对处理组的每个个体i,确定预期匹配的control group

  • 计算每个个体i 的结果变量前后变化

  • 计算每个个体i 所匹配的control variable的结果变量前后变化

  • 进行PSM

详细PSM-DID见第二弹

参考大佬:陈强,Ani Katchova

经典图书介绍:

关于政策评估的图书中,世界银行出版的《Handbook on Impact Evaluation》

Handbook on Impact Evaluation

morgan and winship 和《Counterfactuals and causal inference》出了两版,一版是2007年,2rd是2015年,这本书在谷歌图书和经管之家的网站上均可以找到。

这本书里面都是algorithm和原理,看不懂看不懂

反事实与因果推断

Rosenbaum 2010的专著《Design of Observational Studies》可以在Springer 下载

Guo&Fraser. 2010.《Propensity score analysis:Statistical methods and applications》

完结撒花~~✿✿ヽ(°▽°)ノ✿

全员结尾

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多