alpha多样性指数的大小是与使用的ASV/OTU表的抽平深度有关,为探究样本alpha多样性随抽平深度的变化曲线,可绘制稀释曲线(rarefaction curve),这是生态领域的一种常用方法。 稀释曲线通过从每个样本中随机抽取一定数量的序列(即在不超过现有样本测序量的某个深度下进行重新抽样),可以预测在一系列给定的测序深度下,所可能包含的物种总数及其中每个物种的相对丰度。因此,通过绘制稀释曲线,还可以在相同测序深度下,通过比较样本中ASV/OTU数的多少,从而在一定程度上衡量每个样本的多样性高低。
Richness指数(物种丰富度指数)的Alpha多样性曲线在很多情况下等同于稀释曲线(rarefaction curve)。下面小编开始绘制简单的稀释曲线和Richness指数曲线。
数据文件长这样: 1.调用相关R包,读取otu物种丰度表;
library(vegan) #用于计算 Shannon 熵指数、Simpson 指数、Chao1 指数、ACE 指数等,同时用于抽样
library(picante) #用于计算 PD_whole_tree,若不计算它就无需加载。
library(ggplot2) #用于 ggplot2 作图
library(doBy) #用于分组统计
library(ggalt) #用于绘制拟合曲线
otu <- read.delim('feature-table_taxonomy.txt', row.names = 1, sep = '\t', stringsAsFactors = FALSE, check.names = FALSE)
otu=otu[,-dim(otu)[2]]
otu <- t(otu)
2.定义函数;
#计算多种 Alpha 多样性指数,结果返回至向量
alpha_index <- function(x, method = 'richness', tree = NULL, base = exp(1)) {
if (method == 'richness') result <- rowSums(x > 0) #丰富度指数
else if (method == 'chao1') result <- estimateR(x)[3, ] #Chao1 指数
else if (method == 'ace') result <- estimateR(x)[5, ] #ACE 指数
else if (method == 'shannon') result <- diversity(x, index = 'shannon', base = base) #Shannon 指数
else if (method == 'simpson') result <- diversity(x, index = 'simpson') #Gini-Simpson 指数
else if (method == 'pielou') result <- diversity(x, index = 'shannon', base = base) / log(estimateR(x)[1, ], base) #Pielou 均匀度
else if (method == 'gc') result <- 1 - rowSums(x == 1) / rowSums(x) #goods_coverage
else if (method == 'pd' & !is.null(tree)) { #PD_whole_tree
pd <- pd(x, tree, include.root = FALSE)
result <- pd[ ,1]
names(result) <- rownames(pd)
}
result
}
#根据抽样步长(step),统计每个稀释梯度下的 Alpha 多样性指数,结果返回至列表
alpha_curves <- function(x, step, method = 'richness', rare = NULL, tree = NULL, base = exp(1)) {
x_nrow <- nrow(x)
if (is.null(rare)) rare <- rowSums(x) else rare <- rep(rare, x_nrow)
alpha_rare <- list()
for (i in 1:x_nrow) {
step_num <- seq(0, rare[i], step)
if (max(step_num) < rare[i]) step_num <- c(step_num, rare[i])
alpha_rare_i <- NULL
for (step_num_n in step_num) alpha_rare_i <- c(alpha_rare_i, alpha_index(x = rrarefy(x[i, ], step_num_n), method = method, tree = tree, base = base))
names(alpha_rare_i) <- step_num
alpha_rare <- c(alpha_rare, list(alpha_rare_i))
}
names(alpha_rare) <- rownames(x)
alpha_rare
}
3.vegan包rarecurve()可以直接绘制稀释曲线;
rarecurve(otu, step = 2000, col = c('red', 'green', 'blue', 'orange', 'purple', 'black'))
4.下面绘制Richness指数曲线。
##多计算几次以获取均值 ± 标准差,然后再展示出也是一个不错的选择
plot_richness <- data.frame()
for (n in 1:5) {
richness_curves <- alpha_curves(otu, step = 2000, method = 'richness')
for (i in names(richness_curves)) {
richness_curves_i <- (richness_curves[[i]])
richness_curves_i <- data.frame(rare = names(richness_curves_i), alpha = richness_curves_i, sample = i, stringsAsFactors = FALSE)
plot_richness <- rbind(plot_richness, richness_curves_i)
}
}
#计算均值 ± 标准差(doBy 包中的 summaryBy() 函数)
library(doBy)
plot_richness_stat <- summaryBy(alpha~sample+rare, plot_richness, FUN = c(mean, sd))
plot_richness_stat$rare <- as.numeric(plot_richness_stat$rare)
plot_richness_stat[which(plot_richness_stat$rare == 0),'alpha.sd'] <- NA
#ggplot2 作图,可自行修改 ggplot2 作图细节
p1<-ggplot(plot_richness_stat, aes(rare, alpha.mean, color = sample)) +
geom_line() +
geom_point() +
geom_errorbar(aes(ymin = alpha.mean - alpha.sd, ymax = alpha.mean + alpha.sd), width = 500) +
labs(x = 'Number of sequences', y = 'Richness', color = NULL) +
theme(panel.grid = element_blank(), panel.background = element_rect(fill = 'transparent', color = 'black'), legend.key = element_rect(fill = 'transparent')) +
geom_vline(xintercept = min(rowSums(otu)), linetype = 2) +
scale_x_continuous(breaks = seq(0, 30000, 5000), labels = as.character(seq(0, 30000, 5000)))
p1
稀释曲线可直接反映测序数据量的合理性,并间接反映样品中物种的丰富程度,当曲线趋向平坦时,说明测序数据量渐进合理,更多的数据量只会产生少量新OTUs(物种);反之表明不饱和,增加数据量可以发现更多OTUs。
|