【原】R数据分析：交叉滞后模型基础与实例解析

CodewarCodewar 2021-07-19

展开全文

最近问纵向数据分析的同学贼多，像潜增长，GEE，多水平，之前好像都有写，今天偷空出个简易的交叉滞后教程哈，希望对大家有用。大家只要遇到像causal models，cross- lagged panel models，linear panel models 和autoregres-sive cross- lagged models 这些，都要反应过来他们都是一个东西，都叫面板模型，统一的数据特点就是把变量纵向测量很多波，然后想探讨变量间的关系，最简单的情形就是两波的时候啦，如下图：

我们看着这个图，可以自然而然地写出来这两个方程：

上面式子中的β1和β3叫做自相关系数，描述了此构象的稳定性。越大越稳定，很好理解哈。

β2和β4叫做交叉滞后系数，表示一个构象对滞后的另外一个构象的作用。这个系数是在控制自身预测作用后体现出来的一个变量对滞后一期的另外一个变量的作用，所以叫做滞后效应，其相对于传统回归的优势就在于其控制了自回归效应，然后在面板数据中我们既可以让x1影响y2，也可以让y1影响x2，图中就有一个交叉，所以就叫交叉滞后模型：

The fact that prior levels of the outcome construct are controlled for allows one to rule out the possibility that a cross-lagged effect is due simply to the fact that X and Y were correlated at time 1.

当然啦，上面的例子都是两个构象和两个时点的情况，该模型也可以延伸到多个构象和多个时间点，相应的系数都是一个意思。

The preceding model can be extended to more than two occasions and more than two constructs. The autoregressive and cross-lagged effects retain the same meaning.

交叉滞后的优势

搞纵向数据的时候我们其实是需要一个变量随时间变化的假设或者理论的，但是交叉滞后模型并没有这个一个假设，我们只是单纯地将自回归效应加进去而已，所以有人其实是不赞成这种方法，反而更加倾向于潜增长或者GEE等明确变异关系的方法：

path models, such as the panel model,should be avoided because they do not begin with an explicit statement of the expected change process

但是对变量间的具体变化并不关心的时候，交叉滞后不失为一种好方法，好处体现在：

对相互作用（Reciprocal Effects）的研究上

相互影响的关系其实是很多的，比如母亲和子代相互影响，人和环境相互影响等等，交叉滞后模型可以让这种关系的研究变得更容易，比如你可以很容易地通过交叉滞后模型知道到底是x在影响y还是y在影响x，或者是相互都有影响，以及每个路径的强度：

Results from a panel analysis can be used to determine whether cross-lagged effects occur in both directions (i.e., whether X1 predicts Y2 and Y1 predicts X2) and to assess the relative strength of the cross-lagged effects. For example, data based on the observation of a parent–child dyad could be analyzed to see whether a parent’s behavior affects the child’s subsequent behavior or the child’s behavior affects the parent’s subsequent behavior and even to see which of the two cross-lagged effects is stronger.

对中介效应（Mediation）的研究上

很多人都是随便拉3个有关系的变量就开始做中介，这个不好评价，水水论文嘛，但是更好更清晰地说明中介效应的存在，应该使用面板数据的分析：

The longitudinal nature of the data from the panel design provides an advantage over mediation models estimated using cross-sectional data

对调节效应（Moderation）的研究上

通常我们检验调节作用的方法是将自变量和调节变量的积放在回归模型中，这种情况只适用于显变量的时候，如果你的调节变量是个潜变量使用交叉滞后面板模型就会更加容易。

交叉滞后中的测量不变性

测量不变性之前文章有写，这个不是说我们要测定构象是不变化的，而是评估我们测量真实性的一个指标：

It addresses only the equivalence of measurement of the construct to ensure that the differences in the constructs are true differences

测量不变性的基本逻辑在于：如果一个构象随时间改变，那么它所有的显变量都应该是向同一个方向改变同样的数量：

The basic idea of factorial invariance is that if the construct changes over time, then this change is conveyed as changes in all the indicators in the same direction and the same amount.

如果显变量的改变出了矛盾的情况就说明测量不变性不再满足，需要注意的是测量不变性是对潜变量而言的，在只有显变量的结构模型中是没有测量不变性的。

交叉滞后面板模型和因果推断

一段时间以来，人们会想当然地说面板可以说明因果关系，因为面板模型它测量了很多波数据，满足两个很重要的因果推断的前提，一个是时间先后，一个是对自身和其余混杂的控制：

Two fundamental aspects of causal inference：
First, by measuring putative causes prior to the effects, temporal precedence of the cause is supported, and
Second, by simultaneously modeling the unique effect of several causes, it may be possible to support a causal explanation of one variable over another.

但是仅凭面板数据推因果也是有问题的，首先你是不能独立地操纵你所谓的原因变量，你没法检验，第二有可能你会忽略了其它的预测因子，所以使用交叉滞后模型做因果推论也是需要谨慎的：

the putative causes often cannot be manipulated or cannot be manipulated independently from other variables in the model. In addition,proper causal inference rests on model assumptions such as including all relevant predictors.As noted earlier, this assumption can be difficult to establish.

总的评价就是这个方法可以作为提示因果关系的尝试，但是下因果结论是要审慎的。

交叉滞后的时间间隔

通常情况下我们的数据都是等时间间隔的，比如每隔3个月测一次，每隔2周测一次，这种等时间间隔蕴含这一个假设就是x对y的滞后效应和y对x的滞后效应都是等时间间隔的：

Most panel designs measure all variables on a fixed lag schedule. The fact that all variables are measured at the same time implicitly assumes that the time for the cross-lagged effect of X on Y and Y on X is the same

这个假设一定对吗？不好说

所以大家在选择随访时间的时候也需要注意间隔，至少你也得提前找找参考文献嘛。

实例解析

在这个例子中我想要研究母亲的抑郁症状和孩子的内化问题之间的关系，现在的文献中有两种观点：一种是抑郁母亲的孩子更容易有问题，另一种是有问题的孩子的母亲会更容易抑郁，到底哪种对呢？

今天就用交叉滞后面板模型来解决这个问题，我们的数据中抑郁是用21条目的CES-D测的，孩子的内化问题用的是CBCL测得的，在做结构模型之前我们得验证数据的测量不变性（见之前的文章：文献解读：纵向数据的测量不变性和交叉滞后模型（一）文献解读：纵向数据的测量不变性和交叉滞后模型（二））结果如下：

交叉滞后分析的结果如下图（p均<0.01），可以用lavaan做，也可以用Mplus做：

从结果数据中可以看到构象间的稳定性还是不错的，母亲抑郁对孩子问题的滞后效应是显著的0.12，就是说在控制了孩子之前问题水平的情况下，母亲越抑郁，孩子就越容易有问题；同时，孩子问题对母亲抑郁也有滞后效应0.2，就是说在控制了母亲先前抑郁水平的时候，有问题的孩子的母亲依然更加容易抑郁。

就是说母亲抑郁和孩子问题是相互的--reciprocity

所以忙活了半天，交叉滞后模型依然没能给我们提示准确的因果方向：

Consistent with our previous discussion of the use of panel models for causal inference,we do not see these results as support for a causal effect of maternal depressive symptoms on child internalizing behavior or of child internalizing behavior on maternal depressive symptoms.

这个也是可以接受或解释的：毕竟我们只考虑了两个变量嘛。

The present analyses identify an interesting association that warrants further research, but with only two variables in the model and given the impossibility of manipulating either maternal depressive symptoms or child internalizing behavior, the results should not be used to bolster a causal claim without further supporting evidence.

本文参考文献：

Selig, James & Little, Todd. (2012). Autoregressive and cross-lagged panel analysis for longitudinal data.
Little, Todd & Preacher, K & Selig, James & Card, N. (2007). New developments in latent variable panel analyses of longitudinal data. International Journal of Behavioral Development. 31. 357-365.

小结