【原】单因素组间差异检验

哈_有条鱼 2022-04-30 发布于湖北

展开全文

三个或多个均值（one-way anova or Kruskal-Wallis test）

前提：

与两个均值的情况差不多，残差正态性，同分布，方差齐性

正态性：如果数据不满足正态性，可尝试进行数据转换，实在不行也不建议直接使用Kruskal-Wallis test，该方法同样要求数据满足同分布的前提条件。建议仍然使用单因素方差分析，但如果你通过该方法得到的p值接近显著性水平（如0.05）则应谨慎下结论。

方差齐性：对于平衡数据（即每组样本量相等），one-way anova对方差是否相等并不敏感（除非其中一个样本的标准差是其他样本的3倍及以上且每组样本量小于10）。非平衡数据（样本量较小时）中的方差不等会使得犯I型错误的概率增大，样本量较大时则会与之相反。

也就是说数据好才是真的好（努力获得平衡数据吧😂 ）

对于平衡数据来说，在样本量小于10且组间方差相差较大时，可使用 Welch's anova。对于非平衡数据，更应该关注是否满足方差齐性，在方差不齐时，如果相差不大，也可使用Welch's anova，此时相对one-way anova更加准确。

下面仅用数据进行代码演示，并不一定真的适用该方法

使用iris数据中的Sepal.Length进行展示

head(iris)
library(rstatix)
library(dplyr)
iris %>% 
  #select("Sepal.Length","Species") %>% 
  group_by(Species) %>% 
  get_summary_stats(Sepal.Length,type = "mean_sd")#描述性统计，平衡数据，每组50个样本

方法1：

这里先使用onewaytests包进行分析，关于其详细信息，可参考下面这篇文章

library(onewaytests)#install.packages("onewaytests")
#残差正态性检验
nor.test(Sepal.Length~Species,data = iris)
##方差齐性检验
homog.test(Sepal.Length~Species,data = iris,method = "Bartlett")

one-way anova

mod1 <- aov.test(Sepal.Length~Species,data = iris,alpha = 0.05,verbose = TRUE)

其中alpha为显著性水平，verbose为检验结果是否显示

paircomp(mod1,adjust.method = "bonferroni")#多重比较

Welch's anova

welch.test(Sepal.Length~Species,data = iris,alpha = 0.05,verbose = TRUE)，与one-way anova差不多，不再赘述

Kruskal-Wallis test

kw.test(Sepal.Length~Species,data = iris,alpha = 0.05,verbose = TRUE)

关于这个包所涉及功能的详细解释，可查看下面这篇文献：

onewaytests

方法2：

使用rstatix包进行分析:

加载数据

data("PlantGrowth")
library(rstatix)
PlantGrowth <- PlantGrowth %>%
  reorder_levels(group, order = c("ctrl", "trt1", "trt2"))#对因子型变量排序
#描述性统计
PlantGrowth %>%
  group_by(group) %>%
  get_summary_stats(weight, type = "mean_sd")
#可视化
library(ggpubr)
ggboxplot(PlantGrowth, x = "group", y = "weight")

前提条件检验

#正态性检验
# Build the linear model
model  <- lm(weight ~ group, data = PlantGrowth)
# Create a QQ plot of residuals
ggqqplot(residuals(model))
# Compute Shapiro-Wilk test of normality
shapiro_test(residuals(model))
#或者分组检验正态性
PlantGrowth %>%
  group_by(group) %>%
  shapiro_test(weight)
#方差齐性
# Levene’s test to check the homogeneity of variances:
PlantGrowth %>% levene_test(weight ~ group)

显著性检验及多重比较

one-way anova和TukeyHSD事后检验

res.aov <- PlantGrowth %>% anova_test(weight ~ group)
res.aov
#多重比较
# Pairwise comparisons
pwc <- PlantGrowth %>% tukey_hsd(weight ~ group)
pwc
# 结果可视化
pwc <- pwc %>% add_xy_position(x = "group")
ggboxplot(PlantGrowth, x = "group", y = "weight") +
  stat_pvalue_manual(pwc, hide.ns = TRUE) +
  labs(
    subtitle = get_test_label(res.aov, detailed = TRUE),
    caption = get_pwc_label(pwc)
  )

Welch one-way test and Games-Howell post hoc test

# Welch One way ANOVA test
res.aov2 <- PlantGrowth %>% welch_anova_test(weight ~ group)
# Pairwise comparisons (Games-Howell)
pwc2 <- PlantGrowth %>% games_howell_test(weight ~ group)
# Visualization: box plots with p-values
pwc2 <- pwc2 %>% add_xy_position(x = "group", step.increase = 1)
ggboxplot(PlantGrowth, x = "group", y = "weight") +
  stat_pvalue_manual(pwc2, hide.ns = TRUE) +
  labs(
    subtitle = get_test_label(res.aov2, detailed = TRUE),
    caption = get_pwc_label(pwc2)
  )
#pairwise_t_test()也可以实现多重比较，要进行p值校正
pwc3 <- PlantGrowth %>% 
  pairwise_t_test(
    weight ~ group, pool.sd = FALSE,
    p.adjust.method = "bonferroni"
  )
pwc3

Kruskal-Wallis test和Dunn’s test多重比较

#Kruskal-Wallis test
(res.kruskal <- PlantGrowth %>% kruskal_test(weight ~ group))

# Pairwise comparisons using Dunn’s test:
(pwc <- PlantGrowth %>% 
  dunn_test(weight ~ group, p.adjust.method = "bonferroni") )

# Visualization: box plots with p-values
pwc <- pwc %>% add_xy_position(x = "group")
ggboxplot(PlantGrowth, x = "group", y = "weight") +
  stat_pvalue_manual(pwc, hide.ns = TRUE) +
  labs(
    subtitle = get_test_label(res.kruskal, detailed = TRUE),
    caption = get_pwc_label(pwc)
  )

方法3：

使用car包进行分析，Ⅱ型平方和与Ⅰ型平方和后续会讲到，单因素方差分析结果相同

one-way anova和TukeyHSD事后检验

#残差正态性检验
mod1 <- lm(Sepal.Length~Species,data = iris)
shapiro.test(resid(mod1))

##方差齐性检验
library(car)
leveneTest(Sepal.Length~Species,data = iris)#方差不齐

Anova(mod1)#Ⅱ型平方和
anova(mod1)#I型平方和
summary(mod1)

#多重比较
TukeyHSD(aov(mod1))

题外：

summary结果与anova的对应关系可看图中的注释

summary结果的进一步解读，让我们回到数据描述性统计部分：

可看到图中ctrl组的均值（mean）的与summary结果中Estimate的Intercept对应，都为5.03，trl1的均值4.66为summary结果中Estimate的grouptrl1的值与Intercept的值相加（4.66=5.0320-0.3710），另一组以此类推。
这是因为在最开始我们将因子型变量的顺序定义为这个顺序，而在summary中就以排在第一个水平的组为基础（Intercept）进行一一比较，得出trl1比ctrl低，以及相应的p值为多少。

理解summary结果的含义有利于后续其他部分的讲解，这里正好提一下。

回到正题：TukeyHSD, HSD.test 和 LSD.test 可能并不是很适合非平衡数据，建议使用 multcomp and lsmeans packages，此外 DTK package不要求数据样本量相等及方差齐性。

Kruskal-Wallis test (基于stats包)和Dunn’s test多重比较:

#Kruskal-Wallis test
kruskal.test(weight ~ group,data = PlantGrowth)
#多重比较
library(FSA)
dunnTest(weight ~ group,data = PlantGrowth,method = "bonferroni")