昨天的science组合图换方案了

生物_医药_科研 2020-03-26

展开全文

昨天的science组合图换方案了。作者厚缊针对ggcor第一版和第二版中组合图模块中的问题进行了维护，下面一起来看看来自包的作者的官方最新版本：

ggcor的第一个公开版本（ggcor |相关系数矩阵可视化）以来，可能最不稳定的就是那个组合图的模块了，前面至少有两个完全不一样的版本：第一个版本我想啥都做好，用户就自己加上去就行了，但是参数多的让人想吐；第二个版本一开始我是很满意的，因为把原来的那一堆参数隐藏在了extra.params里面，假装add_link()参数很少，用的越多，越觉得这是个十分糟糕的设计。尽管觉得很糟糕，但是一直没有勇气来想想办法调整，打算将就着过得了。这个时候得感谢Y叔，Y叔说这个实现太不gg了，让我有了必须得找个更gg的实现方法的信念，这就是目前的版本，大体上也会是最后一个版本，至少现在我自己比较满意了（希望打脸不要来得更快）。

安装

新方式需要最新的0.9.4.1版（若后续代码运行不成功，自行检查版本号。），若是感觉没有用就不要强制更新了。暂时还未删除add_link()相关的内容，等这一轮测试调整完毕全部删除，以后只支持新玩法。

## install.packages('devtools)
devtools::install_github('houyunhuang/ggcor')
## 安装过老版本的加 force = TRUE 参数

packageVersion('ggcor')

## [1] '0.9.4'

设计理念

从本质上来说，science组合图的连线部分无外乎就是点、线和标签三部分，原来之所以陷入了十分不gg不友好的设计陷阱里面，就是我太想一步到位，感觉这样更简单，显示的情况恰恰相反，一步到位就会在一个函数里面做更多的事情，更多的事情也就意味着更多的参数，更多的参数就是用户不友好。

在重构的过程中，我彻底放弃了一个函数大包大揽的设计，把数据处理过程，添加图形元素（包括点、线、标签）全部拆分成定制的零部件，这样需要什么找到对应的函数，加上去，最多调整一两个参数，就完美出图。我自己认为，这样的设计里面才能说是有点gg的味道吧。当然，目前的数据过程悬在外面还有点丑陋，以后可能也可以尝试一下更友好的快捷方式。

数据过程

说得俗气一点，要处理这样的图至少先要描个点，数据过程就是我尝试帮你描点的过程，点描好了接下来啥都好办，点描不好可能就什么都办不了。主要包括parallel_layout()和combination_layout()两个函数（本来计划是有个circle_layout()的，但是想来想去觉得这个用网络图更快捷，暂时没必要了），正如名字看到的，parallel_layout()处理平行坐标的连接图，combination_layout()处理science的那个组合图。

`parallel_layout()`函数

看上去参数很多，其实常用的就是前三个，一个是指定数据，接下来的start.var、end.var指定起点、终点的列名，默认情况下，起点是第一列、终点是第二列。起点、终点都是字符型，若不是字符型会强制转化。

library(ggcor)

args(parallel_layout)

## function (data, start.var = NULL, end.var = NULL, horiz = FALSE,
##     stretch = TRUE, sort.start = NULL, sort.end = NULL, start.x = NULL,
##     start.y = NULL, end.x = NULL, end.y = NULL)
## NULL

平行坐标主要适用于起点、终点不一样的情况，先看个简单的例子。结果前四列是起点终点的坐标，5-6列是节点标签，其它部分后续用到再慢慢讲。

## 构造数据
m1 <- matrix(rnorm(15*10), nrow = 15, dimnames = list(NULL, paste0('mat', 1:10)))
m2 <- matrix(rnorm(15*12), nrow = 15, dimnames = list(NULL, paste0('matrix', 1:12)))
correlate(m1, m2) %>%
  as_cor_tbl() %>%
  parallel_layout()

现在点是描出来了，可是怎么把这么一份凌乱数据变成图呢？其实很容易了，有一组图层函数，做了很多本来需要你自己做的事情。有朋友跟我说，我绝对不是一个密集恐惧症患者，写的东西全是密集恐惧症慎入。

library(ggplot2)
correlate(m1, m2) %>%
  as_cor_tbl() %>%
  parallel_layout() %>%
  ggplot() +
  geom_link(aes(colour = r)) +
  geom_start_point(fill = 'red', shape = 23, size = 4) +
  geom_end_point(fill = 'blue', shape = 21, size = 4) +
  geom_start_label(aes(x = x - 0.05), hjust = 1, size = 5) +
  geom_end_label(aes(x = xend + 0.05), hjust = 0, size = 5) +
  coord_cartesian(xlim = c(-0.15, 1.15)) +
  theme_void()

我们也可以把上图放到，让节点水平排列。

correlate(m1, m2) %>%
  as_cor_tbl() %>%
  parallel_layout(horiz = TRUE) %>%
  ggplot() +
  geom_link(aes(colour = r)) +
  geom_start_point(fill = 'red', shape = 23, size = 4) +
  geom_end_point(fill = 'blue', shape = 21, size = 4) +
  geom_start_label(aes(y = y - 0.05), angle = 90, hjust = 1, size = 5) +
  geom_end_label(aes(y = yend + 0.05), angle = 270, hjust = 1, size = 5) +
  coord_cartesian(ylim = c(-0.2, 1.25)) +
  theme_void()

`combination_layout()`函数

必须得事先说明，我确实不知道那个science的组合图应该叫什么，导致我这几乎不会起名字，然后就暂时性随便用了一个。除了参数，整个函数的作用和parallel_layout()几乎一样。

args('combination_layout')

## function (data, type = NULL, show.diag = NULL, row.names = NULL,
##     col.names = NULL, start.var = NULL, end.var = NULL, cor_tbl)
## NULL

type、show.diag、row.names、col.names都是相关系数矩阵相关的参数，若觉得这些有点多难得处理，完全可以传递一个cor_tbl对象给cor_tbl参数，这样所有的问题完美解决。

library(vegan) ## 获取数据

data('varespec')

data('varechem')
mantel <- mantel_test(varespec, varechem)
combination_layout(mantel, type = 'upper', show.diag = FALSE,
                   col.names = names(varechem))

不手动处理其它繁琐的参数，发挥一下cor_tbl对象的魔力。

corr <- correlate(varechem, cor.test = TRUE) %>%
  as_cor_tbl(type = 'upper', show.diag = FALSE)
combination_layout(mantel, cor_tbl = corr)

只看数据当然并不能很好地看出这个函数做了哪些工作，看图吧。

combination_layout(mantel, cor_tbl = corr) %>%
  ggplot() +
  geom_link(aes(colour = r), curvature = 0.05) +
  geom_start_point(fill = 'red', shape = 23, size = 4) +
  geom_end_point(fill = 'blue', shape = 21, size = 4) +
  geom_start_label(aes(x = x - 0.5), hjust = 1, size = 5) +
  geom_end_label(aes(x = xend + 0.5), hjust = 0, size = 5) +
  coord_cartesian() +
  theme_void()

多个群落分组的情况是完全一样的，只要mantel检验的时候处理好多个分组就行。

mantel2 <- mantel_test(varespec, varechem,
                       spec.select = list(Spec01 = 1:7,
                                          Spec02 = 8:18,
                                          Spec03 = 19:30))

combination_layout(mantel2, cor_tbl = corr) %>%
  ggplot() +
  geom_link(aes(colour = r), curvature = 0.05) +
  geom_start_point(fill = 'red', shape = 23, size = 4) +
  geom_end_point(fill = 'blue', shape = 21, size = 4) +
  geom_start_label(aes(x = x - 0.5), hjust = 1, size = 5) +
  geom_end_label(aes(x = xend + 0.5), hjust = 0, size = 5) +
  coord_cartesian(xlim = c(-5, 14)) +
  theme_void()

这里有个细节需要注意，对于这个图，默认的起点都是群落，终点是环境，当然完全可以通过start.var、end.var参数来改变默认设置。结合下三角的情形我们看看怎么处理。注意：这个layout都是为science组合图设计的，所以你的终点（即环境）一定是和环境矩阵对应，这也就是为什么可以直接传入cor_tbl对象的原因，当不对应时，匹配失败，可能结果就只有几个孤立的点。

corr2 <- correlate(varechem, cor.test = TRUE) %>%
  as_cor_tbl(type = 'lower', show.diag = FALSE)
mantel2$env2 <- paste0(mantel2$spec, '_A')
combination_layout(mantel2, cor_tbl = corr2,
                   start.var = env2) %>%
  ggplot() +
  geom_link(aes(colour = r), curvature = 0.05) +
  geom_start_point(fill = 'red', shape = 23, size = 4) +
  geom_end_point(fill = 'blue', shape = 21, size = 4) +
  geom_start_label(aes(x = x + 0.5), hjust = 0, size = 5) +
  geom_end_label(aes(x = xend - 0.5), hjust = 1, size = 5) +
  coord_cartesian(xlim = c(0.5, 22)) +
  theme_void()

组合

这里可能是目前设计的逻辑里面，相对麻烦一点的，感觉是两幅不同的图要拼接在一起，但是对于传统的方方正正的图，拼接起来也就是放在AI里面拉一拉就行了，但是这个倾斜的三角，要拼起来可能还是得花一丢丢时间，这里就看个人喜好了，我觉得R可以解决，就用R整了，你觉得AI更好，那就用AI，无所谓优劣，能快速出图就行。

从本质上说，相关系数矩阵热图是根据cor_tbl对象画的，这个连接图是根据combination_layout()计算的坐标画的，那么我们就完全有办法把这几个对象结合起来，只不过有的图层需要单独指定数据。

方案一

保留原来的风格，相关系数矩阵热图用quickcor()函数解决，连接图添加额外的图层。有个小技巧：我们可以通过全局变量ggcor.link.inherit.aes来改变link相关的图层映射参数继承。

df <- combination_layout(mantel2, cor_tbl = corr) ## 这里需要存储下结果
options(ggcor.link.inherit.aes = FALSE) ## 全局选项
quickcor(corr) + geom_square() +
  geom_link(aes(colour = r), data = df) + # 添加连接线
  geom_start_point(fill = 'red', shape = 23, size = 4, data = df) +
  geom_end_point(fill = 'blue', shape = 21, size = 4, data = df) +
  geom_start_label(aes(x = x - 0.5), hjust = 1, size = 5, data = df) +
  geom_end_label(aes(x = xend + 0.5), hjust = 0, size = 3.8, data = df) +
  expand_axis(x = c(-6, 14.5)) +
  remove_y_axis()

方案二

我们也可以先画连接图，然后把相关性矩阵热图（这也太简单了吧！一个函数完成数据相关性热图计算和展示）叠加在上面。

ggplot(df) +
  geom_link(aes(colour = r)) + # 添加连接线
  geom_start_point(fill = 'red', shape = 23, size = 4) +
  geom_end_point(fill = 'blue', shape = 21, size = 4) +
  geom_start_label(aes(x = x - 0.5), hjust = 1, size = 5) +
  geom_end_label(aes(x = xend + 0.5), hjust = 0, size = 3.8) +
  geom_square(mapping = aes(x = .col.id, y = .row.id, r0 = r, fill = r),
              data = corr) +
  geom_panel_grid(data = corr) +
  scale_x_continuous(breaks = 2:14, labels = names(varechem)[-1],
                     position = 'top') +
  scale_fill_gradient2n() +
  coord_fixed(xlim = c(-6, 14.5), ylim = c(0.5, 14.5), expand = FALSE) +
  remove_y_axis() +
  theme_cor()

下面讲讲怎么控制更多的细节，从内容上说可能会分成两个大的模块，一部分主要讲单独绘制这个连接线部分的图如何控制，另一部分则主要在组合图的环境中如何控制，前者是基础，后面的就是在这个基础上结合一点ggcor的细节。

额外属性映射

上面的内容主要解释了如何把图做出来，下面是把图做好看，而且能反应更多的信息。

对于这样的连接图（类似于网络图），我们一般就是处理边的颜色、粗细和线型，和边相关的都可以在geom_link()中设置，和节点（或者说端点）相关的都是在geom_*_point()里面设置。看个简单的例子：

library(ggcor)
library(ggplot2)
data('varespec', package = 'vegan')
data('varechem', package = 'vegan')

correlate(varespec[1:20], varechem, cor.test = TRUE) %>%
  as_cor_tbl() %>%
  parallel_layout() %>%
  ggplot() +
  geom_link(aes(colour = r, size = r, linetype = r < 0),
            data = function(data) filter(data,
                                         abs(r) > 0.5, p.value < 0.05)) +
  geom_start_point(fill = 'red', shape = 23, size = 4) +
  geom_end_point(fill = 'blue', shape = 21, size = 4) +
  geom_start_label(aes(x = x - 0.05), hjust = 1, size = 5) +
  geom_end_label(aes(x = xend + 0.05), hjust = 0, size = 5) +
  scale_size_area(max_size = 2,
                  breaks = c(0.6, 0.3, -0.3, -0.6)) +
  scale_color_viridis_c() +
  scale_linetype_manual(values = c('TRUE' = 'dashed', 'FALSE' = 'solid')) +
  guides(linetype = guide_legend(order = 1, override.aes = list(size = 2)),
         size = guide_legend(order = 2),
         colour = guide_colorbar(order = 3)) +
  coord_cartesian(xlim = c(-0.2, 1.2)) +
  theme_void()

当然，完全可以事先把没有边连接的节点全部过滤掉。

correlate(varespec[1:20], varechem, cor.test = TRUE) %>%
  as_cor_tbl() %>%
  filter(abs(r) > 0.5, p.value < 0.05) %>%
  parallel_layout() %>%
  ggplot() +
  geom_link(aes(colour = r, size = r, linetype = r < 0),
            curvature = 0.08) +
  geom_start_point(fill = 'red', shape = 23, size = 4) +
  geom_end_point(fill = 'red', shape = 21, size = 4) +
  geom_start_label(aes(x = x - 0.05), hjust = 1, size = 5) +
  geom_end_label(aes(x = xend + 0.05), hjust = 0, size = 5) +
  scale_size_area(max_size = 2,
                  breaks = c(0.6, 0.3, -0.3, -0.6)) +
  scale_color_viridis_c() +
  scale_linetype_manual(values = c('TRUE' = 'dashed', 'FALSE' = 'solid')) +
  guides(linetype = guide_legend(order = 1, override.aes = list(size = 2)),
         size = guide_legend(order = 2),
         colour = guide_colorbar(order = 3)) +
  coord_cartesian(xlim = c(-0.2, 1.2)) +
  theme_void()

vegan包里面的这两个数据集太辣鸡了，做出来不忍直视，有兴趣用自己的数据测试下效果，你也可以把节点的点（Point）变成矩形堆叠在一起（类似于堆叠柱状图）（是Excel的图，不！是R的图）。

组合图

ggcor提供这两种模式的图，都是为了更好的和相关性组合在一起，这样能更好的展示信息。我们先看平行坐标：

mantel <- mantel_test(varespec, varechem,
                      spec.select = list(Spec01 = 1:7,
                                         Spec02 = 8:18,
                                         Spec03 = 19:37,
                                         Spec04 = 38:44)) %>%
  parallel_layout(start.var = env, end.var = spec,
                  start.x = 15, end.x = 20, stretch = FALSE) %>%
  mutate(rd = cut(r, breaks = c(-Inf, 0.2, 0.4, Inf),
                  labels = c('< 0.2', '0.2 - 0.4', '>= 0.4')),
         pd = cut(p.value, breaks = c(-Inf, 0.01, 0.05, Inf),
                  labels = c('< 0.01', '0.01 - 0.05', '>= 0.05')))
corr <- fortify_cor(varechem, cor.test = TRUE)

## 先关闭映射继承
options(ggcor.link.inherit.aes = FALSE)
quickcor(corr) + geom_square() +
  geom_link(aes(colour = pd, size = rd), data = mantel) +
  geom_link_point(data = mantel) +
  geom_end_label(aes(x = xend + 0.5), hjust = 0, data = mantel) +
  scale_size_manual(values = c(0.5, 1, 2)) +
  scale_colour_manual(values = c('#D95F02', '#1B9E77', '#A2A2A288')) +
  guides(size = guide_legend(title = 'Mantel's r',
                             override.aes = list(colour = 'grey35'),
                             order = 2),
         colour = guide_legend(title = 'Mantel's p',
                               override.aes = list(size = 3),
                               order = 1),
         fill = guide_colorbar(title = 'Pearson's r', order = 3)) +
  expand_axis(x = 23)

当你把平行坐标情况下组合想明白了，在上下三角的情况下几乎一模一样，无外乎就是在计算坐标的时候的方法变了。有几个小问题是需要根据自己的数据调整的：

第一个是quickcor()的坐标范围都是根据相关系数矩阵热图自动调整的，但是加上这个表达mantel 检验信息的内容后明显会溢出原范围，需要自己手动来用expand_axis()函数扩展坐标范围；
第二个是要出漂亮的图，当然得靠自己用ggplot2的机制一点点的去调整了。

corr2 <- get_upper_data(corr) ## 只能是上下三角 我们保留上三角
mantel <- mantel_test(varespec, varechem,
                      spec.select = list(Spec01 = 1:7,
                                         Spec02 = 8:18,
                                         Spec03 = 19:37,
                                         Spec04 = 38:44)) %>%
  combination_layout(cor_tbl = corr2) %>%
  mutate(xend = xend + 1,
         rd = cut(r, breaks = c(-Inf, 0.2, 0.4, Inf),
                  labels = c('< 0.2', '0.2 - 0.4', '>= 0.4')),
         pd = cut(p.value, breaks = c(-Inf, 0.01, 0.05, Inf),
                  labels = c('< 0.01', '0.01 - 0.05', '>= 0.05')))

## 先关闭映射继承
options(ggcor.link.inherit.aes = FALSE)
quickcor(corr2) + geom_square() +
  geom_link(aes(colour = pd, size = rd), data = mantel,
            curvature = 0.05) +
  geom_link_point(data = mantel) +
  geom_start_label(aes(x = x - 0.5), hjust = 1, data = mantel) +
  scale_size_manual(values = c(0.5, 1, 2)) +
  scale_colour_manual(values = c('#D95F02', '#1B9E77', '#A2A2A288')) +
  guides(size = guide_legend(title = 'Mantel's r',
                             override.aes = list(colour = 'grey35'),
                             order = 2),
         colour = guide_legend(title = 'Mantel's p',
                               override.aes = list(size = 3),
                               order = 1),
         fill = guide_colorbar(title = 'Pearson's r', order = 3)) +
  expand_axis(x = -6)

过滤不显著的连接

最近有个问题出现的频率比较高，这个问题应该要分开考虑，在平行坐标中若是完全不展示不显著的点，也不做组合图，可以在计算完cor_tbl之后直接过滤，然后处理坐标，若是还想展现节点信息，你需要在作图的时候（也就是geom_link()）中去过滤，这个时候不影响节点；对于上下三角的组合图，本质上你在哪个阶段过滤都行。

mantel <- mantel_test(varespec, varechem,
                      spec.select = list(Spec01 = 1:7,
                                         Spec02 = 8:18,
                                         Spec03 = 19:37,
                                         Spec04 = 38:44)) %>%
  combination_layout(cor_tbl = corr2) %>%
  mutate(xend = xend + 1,
         rd = cut(r, breaks = c(-Inf, 0.2, 0.4, Inf),
                  labels = c('< 0.2', '0.2 - 0.4', '>= 0.4')),
         pd = cut(p.value, breaks = c(-Inf, 0.01, Inf),
                  labels = c('< 0.01', '0.01 - 0.05')))

quickcor(corr2) + geom_square() +
  geom_link(aes(colour = pd, size = rd),
            data = filter(mantel, p.value < 0.05), curvature = 0.05) +
  geom_link_point(data = mantel) +
  geom_start_label(aes(x = x - 0.5), hjust = 1, data = mantel) +
  scale_size_manual(values = c(0.5, 1, 2)) +
  scale_colour_manual(values = c('#D95F02', '#1B9E77')) +
  guides(size = guide_legend(title = 'Mantel's r',
                             override.aes = list(colour = 'grey35'),
                             order = 2),
         colour = guide_legend(title = 'Mantel's p',
                               override.aes = list(size = 3),
                               order = 1),
         fill = guide_colorbar(title = 'Pearson's r', order = 3)) +
  expand_axis(x = -6)

难道非得组合？

原来的都是非得组合，现在不组合我感觉也很好看。

mantel_test(varespec, varechem,
            spec.select = list(Spec01 = 1:7,
                               Spec02 = 8:18,
                               Spec03 = 19:37,
                               Spec04 = 38:44)) %>%
  combination_layout(cor_tbl = corr2) %>%
  mutate(xend = xend + 1,
         rd = cut(r, breaks = c(-Inf, 0.2, 0.4, Inf),
                  labels = c('< 0.2', '0.2 - 0.4', '>= 0.4')),
         pd = cut(p.value, breaks = c(-Inf, 0.01, 0.05, Inf),
                  labels = c('< 0.01', '0.01 - 0.05', '>= 0.05'))) %>%
ggplot() +
  geom_link(aes(colour = pd, size = rd), curvature = 0.1) +
  geom_link_point() +
  geom_start_label(aes(x = x - 0.5), hjust = 1) +
  geom_end_label(aes(x = xend + 0.5), hjust = 0) +
  scale_size_manual(values = c(0.5, 1, 2)) +
  scale_colour_manual(values = c('#D95F02', '#1B9E77', '#A2A2A288')) +
  guides(size = guide_legend(title = 'Mantel's r',
                             override.aes = list(colour = 'grey35'),
                             order = 2),
         colour = guide_legend(title = 'Mantel's p',
                               override.aes = list(size = 3),
                               order = 1)) +
  coord_fixed(xlim = c(-5, 15)) +
  theme_void() +
  theme(legend.position = c(0.75, 0.7))