R绘制稀释曲线

生物_医药_科研 2021-12-23

展开全文

alpha多样性指数的大小是与使用的ASV/OTU表的抽平深度有关，为探究样本alpha多样性随抽平深度的变化曲线，可绘制稀释曲线（rarefaction curve），这是生态领域的一种常用方法。
稀释曲线通过从每个样本中随机抽取一定数量的序列（即在不超过现有样本测序量的某个深度下进行重新抽样），可以预测在一系列给定的测序深度下，所可能包含的物种总数及其中每个物种的相对丰度。因此，通过绘制稀释曲线，还可以在相同测序深度下，通过比较样本中ASV/OTU数的多少，从而在一定程度上衡量每个样本的多样性高低。

Richness指数（物种丰富度指数）的Alpha多样性曲线在很多情况下等同于稀释曲线（rarefaction curve）。下面小编开始绘制简单的稀释曲线和Richness指数曲线。

数据文件长这样：
在这里插入图片描述
1.调用相关R包，读取otu物种丰度表；

library(vegan)	#用于计算 Shannon 熵指数、Simpson 指数、Chao1 指数、ACE 指数等，同时用于抽样
library(picante)	#用于计算 PD_whole_tree，若不计算它就无需加载。
library(ggplot2)	#用于 ggplot2 作图
library(doBy) 	#用于分组统计
library(ggalt)	#用于绘制拟合曲线

otu <- read.delim('feature-table_taxonomy.txt', row.names = 1, sep = '\t', stringsAsFactors = FALSE, check.names = FALSE)
otu=otu[,-dim(otu)[2]]
otu <- t(otu)

2.定义函数；

#计算多种 Alpha 多样性指数，结果返回至向量
alpha_index <- function(x, method = 'richness', tree = NULL, base = exp(1)) {
  if (method == 'richness') result <- rowSums(x > 0)    #丰富度指数
  else if (method == 'chao1') result <- estimateR(x)[3, ]    #Chao1 指数
  else if (method == 'ace') result <- estimateR(x)[5, ]    #ACE 指数
  else if (method == 'shannon') result <- diversity(x, index = 'shannon', base = base)    #Shannon 指数
  else if (method == 'simpson') result <- diversity(x, index = 'simpson')    #Gini-Simpson 指数
  else if (method == 'pielou') result <- diversity(x, index = 'shannon', base = base) / log(estimateR(x)[1, ], base)    #Pielou 均匀度
  else if (method == 'gc') result <- 1 - rowSums(x == 1) / rowSums(x)    #goods_coverage
  else if (method == 'pd' & !is.null(tree)) {    #PD_whole_tree
    pd <- pd(x, tree, include.root = FALSE)
    result <- pd[ ,1]
    names(result) <- rownames(pd)
  }
  result
}


#根据抽样步长（step），统计每个稀释梯度下的 Alpha 多样性指数，结果返回至列表
alpha_curves <- function(x, step, method = 'richness', rare = NULL, tree = NULL, base = exp(1)) {
  x_nrow <- nrow(x)
  if (is.null(rare)) rare <- rowSums(x) else rare <- rep(rare, x_nrow)
  alpha_rare <- list()
  
  for (i in 1:x_nrow) {
    step_num <- seq(0, rare[i], step)
    if (max(step_num) < rare[i]) step_num <- c(step_num, rare[i])
    
    alpha_rare_i <- NULL
    for (step_num_n in step_num) alpha_rare_i <- c(alpha_rare_i, alpha_index(x = rrarefy(x[i, ], step_num_n), method = method, tree = tree, base = base))
    names(alpha_rare_i) <- step_num
    alpha_rare <- c(alpha_rare, list(alpha_rare_i))
  }
  
  names(alpha_rare) <- rownames(x)
  alpha_rare
}

3.vegan包rarecurve()可以直接绘制稀释曲线；

rarecurve(otu, step = 2000, col = c('red', 'green', 'blue', 'orange', 'purple', 'black'))

在这里插入图片描述
4.下面绘制Richness指数曲线。

##多计算几次以获取均值 ± 标准差，然后再展示出也是一个不错的选择

plot_richness <- data.frame()

for (n in 1:5) {
  richness_curves <- alpha_curves(otu, step = 2000, method = 'richness')
  
  for (i in names(richness_curves)) {
    richness_curves_i <- (richness_curves[[i]])
    richness_curves_i <- data.frame(rare = names(richness_curves_i), alpha = richness_curves_i, sample = i, stringsAsFactors = FALSE)
    plot_richness <- rbind(plot_richness, richness_curves_i)
  }
}

#计算均值 ± 标准差（doBy 包中的 summaryBy() 函数）
library(doBy)
plot_richness_stat <- summaryBy(alpha~sample+rare, plot_richness, FUN = c(mean, sd))
plot_richness_stat$rare <- as.numeric(plot_richness_stat$rare)
plot_richness_stat[which(plot_richness_stat$rare == 0),'alpha.sd'] <- NA

#ggplot2 作图，可自行修改 ggplot2 作图细节
p1<-ggplot(plot_richness_stat, aes(rare, alpha.mean, color = sample)) +
  geom_line() +
  geom_point() +
  geom_errorbar(aes(ymin = alpha.mean - alpha.sd, ymax = alpha.mean + alpha.sd), width = 500) +
  labs(x = 'Number of sequences', y = 'Richness', color = NULL) +
  theme(panel.grid = element_blank(), panel.background = element_rect(fill = 'transparent', color = 'black'), legend.key = element_rect(fill = 'transparent')) +
  geom_vline(xintercept = min(rowSums(otu)), linetype = 2) +
  scale_x_continuous(breaks = seq(0, 30000, 5000), labels = as.character(seq(0, 30000, 5000)))
p1

在这里插入图片描述
稀释曲线可直接反映测序数据量的合理性，并间接反映样品中物种的丰富程度，当曲线趋向平坦时，说明测序数据量渐进合理，更多的数据量只会产生少量新OTUs（物种）；反之表明不饱和，增加数据量可以发现更多OTUs。