【原】常用混合线性模型演示：lme4和asreml

育种数据分析 2021-11-18

展开全文

这里使用sleepstudy数据集，看一下免费的R包lme4和付费包asreml如何处理不同的混合线性模型，以加深对混合线性模型的理解。

共进行下面几种演示：

随机斜率，相同截距（Random slopes with same intercept）
随机斜率，随机截距，没有相关性（Random slopes and random intercepts (without correlation)）
随机斜率，随机截距，有相关性（Random slopes and random intercepts (with correlation)
随机斜率，不同截距（Random slopes with a different intercept）
其它lme4不能实现的功能

0. 数据描述

❝
睡眠剥夺研究中受试者每天的平均反应时间。第0天，受试者有正常的睡眠时间。从那天晚上开始，他们每晚只能睡3个小时。观察结果代表了每天对每个受试者进行的一系列测试的平均反应时间。
❞

> head(dat)
  Reaction Days Subject
1 249.5600    0     308
2 258.7047    1     308
3 250.8006    2     308
4 321.4398    3     308
5 356.8519    4     308
6 414.6901    5     308

Reaction:为观测值，遇到刺激的反应时间
Days：剥夺睡眠的天数
Subject：实验对象（ID）

> table(dat$Days)

 0  1  2  3  4  5  6  7  8  9 
18 18 18 18 18 18 18 18 18 18 
> table(dat$Subject)

308 309 310 330 331 332 333 334 335 337 349 350 351 352 369 370 371 372 
 10  10  10  10  10  10  10  10  10  10  10  10  10  10  10  10  10  10

共有18个人，处理天数是0~9天。

这里，Subject为因子，Days为数字类型，Reaction为数字类型。

1. 随机斜率，相同截距

通用的混线性模型，包括固定因子和随机因子。育种中常用的混线性模型，一般是针对于随机因子关系矩阵，而其它领域的一般是针对于不同截距的定义。

固定因子：Days
随机因子：Subject

「lme4：」

这里，Reaction为y变量，Days为固定因子，随机因子用(1 | Subject)表示。

lme4的基本语法：


library(lme4)
mod1a = lmer(Reaction ~ Days + (1 | Subject), data=dat)
summary(mod1a)

结果：

> summary(mod1a)
Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ Days + (1 | Subject)
   Data: dat

REML criterion at convergence: 1786.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2257 -0.5529  0.0109  0.5188  4.2506 

Random effects:
 Groups   Name        Variance Std.Dev.
 Subject  (Intercept) 1378.2   37.12   
 Residual              960.5   30.99   
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept) 251.4051     9.7467   25.79
Days         10.4673     0.8042   13.02

Correlation of Fixed Effects:
     (Intr)
Days -0.371

这里，

随机因子的方差组分为1378.2，残差的方差组分为960.5。
固定因子中，Days为数值协变量，截距的值为251.4，Days的值为10.46

「asreml：」

代码：

library(asreml)
mod1b = asreml(Reaction ~ Days, random = ~ Subject, data=dat)
summary(mod1b)$varcomp
summary(mod1b,coef=T)$coef.fixed

结果：

> mod1b = asreml(Reaction ~ Days, random = ~ Subject, data=dat)
Model fitted using the gamma parameterization.
ASReml 4.1.0 Tue May 11 19:06:29 2021
          LogLik        Sigma2     DF     wall    cpu
 1      -756.229      1572.711    178 19:06:29    0.0
 2      -746.076      1354.228    178 19:06:29    0.0
 3      -736.843      1164.250    178 19:06:29    0.0
 4      -731.820      1050.393    178 19:06:29    0.0
 5      -729.920       986.583    178 19:06:29    0.0
 6      -729.670       964.724    178 19:06:29    0.0
 7      -729.661       960.621    178 19:06:29    0.0
> summary(mod1b)$varcomp
        component std.error  z.ratio bound %ch
Subject 1378.4099  504.5367 2.732031     P 0.2
units!R  960.6208  107.0758 8.971412     P 0.0
> summary(mod1b,coef=T)$coef.fixed
             solution std error  z.ratio
Days         10.46729 0.8042902 13.01432
(Intercept) 251.40510 9.7400376 25.81151

这里，

可以看到Subject的方差组分为1378.4，残差的方差组分为960.6,
Days的值为10.46，截距的值为251.405

计算随机因子的BLUP值：

两者结果一致！

2. 随机斜率，随机截距，没有相关性

这里模型更复杂一点，假定不同的人（项目）有各自的截距，并且他们之间不相关。

❝
This is the a model where you assume that the random effect has different intercepts based on the levels of another variable. In addition the || in lme4 assumes that slopes and intercepts have no correlation.
❞

「lme4」

mod2a =  lmer(Reaction ~ Days + (Days || Subject), data=dat)
summary(mod2a)

结果：

> summary(mod2a)
Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ Days + ((1 | Subject) + (0 + Days | Subject))
   Data: dat

REML criterion at convergence: 1743.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9626 -0.4625  0.0204  0.4653  5.1860 

Random effects:
 Groups    Name        Variance Std.Dev.
 Subject   (Intercept) 627.57   25.051  
 Subject.1 Days         35.86    5.988  
 Residual              653.58   25.565  
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept)  251.405      6.885  36.513
Days          10.467      1.560   6.712

Correlation of Fixed Effects:
     (Intr)
Days -0.184

这里，随机因子有了交互效应。

「asreml：」这里，Days是数字变量，Subject/Days，相当于是Subject + Subject:Days，即Subject为随机因子，每个Subject都有一个Days的系数。

mod2b =  asreml(Reaction ~ Days, random = ~ Subject/Days, data=dat)
summary(mod2b)$varcomp

结果：

> summary(mod2b)$varcomp
             component std.error  z.ratio bound %ch
Subject      627.67842 282.67774 2.220474     P 0.3
Subject:Days  35.86583  14.54252 2.466273     P 0.1
units!R      653.70536  76.67803 8.525328     P 0.0

方差组分结果一致：

看一下两者随机因子的效应值：结果一致。

3. 随机斜率，随机截距，有相关性

❝
This is the a model where you assume that the random effect has different intercepts based on the levels of another variable. In addition a single | in lme4 assumes that slopes and intercepts have a correlation to be estimated
❞

「lme4:」

mod3a =  lmer(Reaction ~ Days + (Days | Subject), data=dat)
summary(mod3a)

「asreml:」

这里，asreml写法比较复杂，用了str函数进行定义。

mod3b =  asreml(Reaction ~ Days, 
                random = ~ str(~Subject + Subject:Days, ~us(2):id(Subject)),
                data=dat)
summary(mod3b)$varcomp

结果：结果一致。

看一下随机因子的效应值：可以看到，结果一致！

4. 随机斜率，不同截距（Random slopes with a different intercept）

模型描述：

❝
This is the a model where you assume that the random effect has different intercepts based on the levels of another variable but there’s not a main effect. The 0 in the intercept in lme4 assumes that random slopes interact with an intercept but without a main effect.
❞

「lme4:」

mod4a =  lmer(Reaction ~ Days + (0 + Days | Subject), data=dat)
summary(mod4a)

「asreml:」这里，只考虑Subject:Days交互作用，不考虑Subject作为随机因子了。

mod4b =  asreml(Reaction ~ Days, 
                random = ~ Subject:Days,
                data=dat)
summary(mod4b)$varcomp
summary(mod4b,coef=T)$coef.fixed

结果比较：结果一致。