诹图——ggcor简介（十四）

生物_医药_科研 2020-04-02

展开全文

这一期会很简单，主要是提供一种mantel检验的替代方案——procrutes 检验。procrutes 检验在图像识别领域用的比较多，用来比较两组数据之间的相似性，本质上说是通过平移、旋转、缩放等一系列变换，使原数据集的点匹配到目标数据集上，并使其和对应点的误差平方和最小。和mantel类似，p值也是通过置换检验得到的。原理的东西确实懂得不多，只能班门弄斧，强行解释。

安装

这部分内容应该是0.9.4就加进去了，一直没有介绍，若是0.9.4版本，可以不用更新。

if(!require(ggcor)) {
  devtools::install_github('houyunhuang/ggcor')
}
## 若之前安装过老版本，加上`force = TRUE`参数

packageVersion('ggcor')

## [1] '0.9.5'

检验

ggcor并不会像vegan、ade4包一样，详细的分析一组procrutes 检验的结果，而是用来处理多组之间比较和检验。默认情况下，spec整体当成一组，env是每列一组，然后分析每个spec组和每个env组之间的关系，也就是说默认情况下是比较的spec整体和env每个单列之间的相似性。procrutes_test()和mantel_test()函数之所以有如此变态的设置，是因为science的那个组合图，但是这样也会误导，让人觉得这两种检验就该这样使用。

library(ggcor)
args(procrutes_test)

## function (spec, env, procrutes.fun = 'protest', spec.select = NULL,

## env.select = NULL, spec.pre.fun = 'identity',

## spec.pre.params = list(),

## env.pre.fun = spec.pre.fun,

## env.pre.params = spec.pre.params,

##           ...)
## NULL

data('varespec', package = 'vegan')
data('varechem', package = 'vegan')
procrutes_test(varespec, varechem)

## # A tibble: 14 x 4
##    spec  env          r p.value
##  * <chr> <chr>    <dbl>   <dbl>
##  1 spec  N        0.326   0.072
##  2 spec  P        0.329   0.041
##  3 spec  K        0.301   0.092
##  4 spec  Ca       0.324   0.059
##  5 spec  Mg       0.302   0.087
##  6 spec  S        0.251   0.21
##  7 spec  Al       0.452   0.003
##  8 spec  Fe       0.410   0.01
##  9 spec  Mn       0.443   0.003
## 10 spec  Zn       0.269   0.152
## 11 spec  Mo       0.139   0.76
## 12 spec  Baresoil 0.428   0.006
## 13 spec  Humdepth 0.436   0.008
## 14 spec  pH       0.358   0.016

若我们想自己设置spec中的分组顺序，这和mantel_test()完全一致，通过spec.select参数设置，同样的道理也适用于env。再次强调，每个分组都要指定名字，否则内部直接自动命名。

## 自定义spec分组
procrutes_test(varespec, varechem, 
               spec.select = list(A = 1:7, B = 8:40))

## # A tibble: 28 x 4
##    spec  env       r p.value
##  * <chr> <chr> <dbl>   <dbl>
##  1 A     N     0.301   0.073
##  2 A     P     0.107   0.926
##  3 A     K     0.260   0.191
##  4 A     Ca    0.219   0.314
##  5 A     Mg    0.205   0.378
##  6 A     S     0.211   0.415
##  7 A     Al    0.392   0.006
##  8 A     Fe    0.309   0.08
##  9 A     Mn    0.202   0.424
## 10 A     Zn    0.104   0.895
## # … with 18 more rows

## 自定义spec和env分组
procrutes_test(varespec, varechem, 
               spec.select = list(A = 1:7, B = 8:40),
               env.select = list(Env01 = 1:7, Env02 = 8:14))

## # A tibble: 4 x 4
##   spec  env       r p.value
## * <chr> <chr> <dbl>   <dbl>
## 1 A     Env01 0.382   0.063
## 2 A     Env02 0.367   0.119
## 3 B     Env01 0.476   0.007
## 4 B     Env02 0.524   0.001

预处理

procrutes检验比mantel检验复杂的地方是，mantel是先进行距离变换，而procrutes可以做任何变换，只要保证输出结果是矩阵即可。目前我看到的有主成份降维和monoMDS处理。procrutes_test()函数提供了非常灵活的接口，支持自定义预处理方式。默认情况下，spec和env的预处理函数完全一样，都是identity，即不做任何变换。

我们先来看看主成份变换的例子。dudi_pca()函数是ggcor封装的ade4::dudi.pca()函数，若是要自定义主成份分析参数，可以通过spec.pre.params处理。

procrutes_test(varespec, varechem, 
               spec.select = list(A = 1:7, B = 8:40),
               env.select = list(Env01 = 1:7, Env02 = 8:14), 
               spec.pre.fun = 'dudi_pca')

## # A tibble: 4 x 4
##   spec  env       r p.value
## * <chr> <chr> <dbl>   <dbl>
## 1 A     Env01 0.454   0.106
## 2 A     Env02 0.445   0.252
## 3 B     Env01 0.496   0.036
## 4 B     Env02 0.557   0.003

## 求pca之前先中心标准化
procrutes_test(varespec, varechem, 
               spec.select = list(A = 1:7, B = 8:40),
               env.select = list(Env01 = 1:7, Env02 = 8:14), 
               spec.pre.fun = 'dudi_pca',
               spec.pre.params = list(center = TRUE))

## # A tibble: 4 x 4
##   spec  env       r p.value
## * <chr> <chr> <dbl>   <dbl>
## 1 A     Env01 0.454   0.096
## 2 A     Env02 0.445   0.245
## 3 B     Env01 0.496   0.044
## 4 B     Env02 0.557   0.001

monoMDS变换也很简单，只需要调整下spec.pre.fun参数就行。

procrutes_test(varespec, varechem, 
               spec.select = list(A = 1:7, B = 8:40),
               env.select = list(Env01 = 1:7, Env02 = 8:14), 
               spec.pre.fun = 'mono_mds')

## # A tibble: 4 x 4
##   spec  env       r p.value
## * <chr> <chr> <dbl>   <dbl>
## 1 A     Env01 0.416   0.029
## 2 A     Env02 0.358   0.091
## 3 B     Env01 0.521   0.006
## 4 B     Env02 0.532   0.002

也有可能，ggcor直接支持的上述三种预处理方式都不能满足你的需求，那么你需要自定义预处理函数。这里我们定义一个行极值标准化的预处理函数作为例子，这个只是表面可以这样定义，不具备理论意义。

row_scale <- function(x, na.rm = TRUE) {
  if(!is.matrix(x))
    x <- as.matrix(x)
  row.max <- apply(x, 1, max, na.rm = na.rm)
  row.min <- apply(x, 1, min, na.rm = na.rm)
  (x - row.min) / (row.max - row.min)
}

procrutes_test(varespec, varechem, 
               spec.select = list(A = 1:7, B = 8:40),
               env.select = list(Env01 = 1:7, Env02 = 8:14), 
               spec.pre.fun = 'row_scale')

## # A tibble: 4 x 4
##   spec  env       r p.value
## * <chr> <chr> <dbl>   <dbl>
## 1 A     Env01 0.359   0.109
## 2 A     Env02 0.320   0.389
## 3 B     Env01 0.474   0.006
## 4 B     Env02 0.512   0.003

组合图

既然procrutes检验是作为mantel检验的替代，那么当然也是可以很方便的来做组合图的，看个例子，不啰嗦了。

library(ggplot2)
corr <- fortify_cor(varechem, type = 'upper') ## 只能是上下三角 我们保留上三角
procrutes <- procrutes_test(varespec, varechem,
                            spec.select = list(Spec01 = 1:7,
                                               Spec02 = 8:18,
                                               Spec03 = 19:37,
                                               Spec04 = 38:44),
                            spec.pre.fun = 'mono_mds') %>%
  combination_layout(cor_tbl = corr) %>%
  mutate(xend = xend + 1,
         rd = cut(r, breaks = c(-Inf, 0.2, 0.4, Inf),
                  labels = c('< 0.2', '0.2 - 0.4', '>= 0.4')),
         pd = cut(p.value, breaks = c(-Inf, 0.01, 0.05, Inf),
                  labels = c('< 0.01', '0.01 - 0.05', '>= 0.05')))

## 先关闭映射继承
options(ggcor.link.inherit.aes = FALSE)
quickcor(corr) + geom_square() +
  geom_link(aes(colour = pd, size = rd), data = procrutes,
            curvature = 0.05) +
  geom_link_point(data = procrutes) +
  geom_start_label(aes(x = x - 0.5), hjust = 1, data = procrutes) +
  scale_size_manual(values = c(0.5, 1, 2)) +
  scale_colour_manual(values = c('#D95F02', '#1B9E77', '#A2A2A288')) +
  guides(size = guide_legend(title = 'Procrutes' r',
                             override.aes = list(colour = 'grey35'),
                             order = 2),
         colour = guide_legend(title = 'Procrutes' p',
                               override.aes = list(size = 3),
                               order = 1),
         fill = guide_colorbar(title = 'Pearson's r', order = 3)) +
  expand_axis(x = -6)