物多样性测定主要有三个空间尺度:α多样性,β多样性,γ多样性。
via铁汉1990
这个脚本调用如下的步骤: alpha_rarefaction.py
-i,
-m,
-o,
-p, -n, --num_steps Number of steps (or rarefied OTU table sizes) to make between min and max counts [default: 10]
-f,
-w,
-a,
-t,
--min_rare_depth The lower limit of rarefaction depths [default: 10] -e, --max_rare_depth The upper limit of rarefaction depths [default: median sequence/sample count] -O, --jobs_to_start Number of jobs to start. NOTE: you must also pass -a to run in parallel, this defines the number of jobs to be started if and only if -a is passed [default: 2] --retain_intermediate_files Retain intermediate files: rarefied OTU tables (rarefaction) and alpha diversity results (alpha_div). By default these will be erased [default: False]
例子: (1)首先把需要做的多样性指数写入txt文档中: echo "alpha_diversity:metricsshannon,PD_whole_tree,chao1,observed_species,goods_coverage,simpson" > alpha_params.txt (2)接着运行脚本(it may need several hours):
alpha_rarefaction.py -i otu_table/otu_table.biom -m map.txt
-o #输入文件otu_table.biom,rep_phylo.tre #输出结果在div_alpha/ div_alpha/alpha_rarefaction_plots/rarefaction_plots.html用网页打开,可以选择你想要表示的图形文件
log文件中显示调用的命令
python /usr/lib/qiime/bin//multiple_rarefactions.py 随即抽取序列,默认的最小取10条序列,最大取16544条序列,下次抽取增加1653条序列,每一步的抽取重复10次
#
python /usr/lib/qiime/bin//alpha_diversity.py
sam@sam-Precision-WorkStation-T7500[mtt3] Known metrics are: ACE, berger_parker_d, brillouin_d, chao1, chao1_confidence, dominance, doubles, equitability, esty_ci, fisher_alpha, gini_index, goods_coverage, heip_e, kempton_taylor_q, margalef, mcintosh_d, mcintosh_e, menhinick, michaelis_menten_fit, observed_species, osd, simpson_reciprocal, robbins, shannon, simpson, simpson_e, singles, strong, PD_whole_tree 可以知道一共有哪些alpha_diversity矩阵
# Collate alpha command
python /usr/lib/qiime/bin//collate_alpha.py #上一步得到的结果中,一个文件夹中包含很多个Alpha多样性矩阵,将文件夹中所有文件中涉及到同一个矩阵的内容提出来,以该矩阵命令,形成新的文件夹。
# Rarefaction plot: All metrics command
python /usr/lib/qiime/bin//make_rarefaction_plots.py
里面提到的几个矩阵:
shannon,
Pi=样品中属于第i种的个体的比例,如样品总个体数为N,第i种个体数为ni,则Pi=ni/N 各种之间,个体分配越均匀,H值就越大。如果每一个体都属于不同的种,多样性指数就最大;如果每一个体都属于同一种,则其多样性指数就最小
Dominance
simpson 辛普森多样性指数=随机取样的两个个体属于不同种的概率 =1-随机取样的两个个体属于同种的概率 越均匀,值越大
PD_whole_tree, 谱系alpha多样性(phylogenetic diversity,Faith 1992):探讨进化历史的保存,应用于种群,群落,生物地理学,保护生物学。 谱系beta多样性(phylobetadiversity,Webb 2002):探讨群落或的确的谱系距离及其成因。 谱系信号与谱系结构(phylogenetic signal and phylogenetic structure):探讨群落和地区物种共存机制。 谱系多样性(phylogenetic diversity PD),某个地点所有物种间最短进化分支长度之和占各节点分支长度综合的比例(Faith,1992) 群落谱系距离(phylogenetic distance):群落I与群落II中种俩俩之间谱系分支长度之和的平均值(Webb,2002) PD_whole_tree:sum of branch lengths between all representatives ????
chao1, Schao1=Sobs+n1(n1-1)/2(n2+1),其中Schao1为估计的OUT数,Sobs为观测到的OTU数,n1为只有一天序列的OUT数目,n2为只有两天序列的OUT数目。
observed_species, Otu的个数
goods_coverage 测序深度:C=1-n1/N,n1为只有含一条序列的OTU数目,N为抽样中出现的总的序列数目。
参考资料: http:///scripts/alpha_rarefaction.html
multiple_rarefactions.py注解
alpha_diversity.py注解
collate_alpha.py
make_rarefaction_plots.py
http://blog.sina.com.cn/s/blog_670445240102uw6s.html
——————
Diversity indexA
diversity index is a quantitative measure that reflects how many
different types (such as species) there are in a dataset, and
simultaneously takes into account how evenly the basic entities
(such as individuals) are distributed among those types. The value
of a diversity index increases both when the number of types
increases and when evenness increases. For a given number of types,
the value of a diversity index is maximized when all types are
equally abundant.
When
diversity indices are used in ecology, the types of interest are
usually species, but they can also be other categories, such as
genera, families, functional types or haplotypes. The entities of
interest are usually individual plants or animals, and the measure
of abundance can be, for example, number of individuals, biomass or
coverage. In demography, the entities of interest can be people,
and the types of interest various demographic groups. In
information science, the entities can be characters and the types
the different letters of the alphabet. The most commonly used
diversity indices are simple transformations of the effective
number of types (also known as 'true diversity'), but each
diversity index can also be interpreted in its own right as a
measure corresponding to some real phenomenon (but a different one
for each diversity index).
Shannon indexThe Shannon index has been
a popular diversity index in the ecological literature, where it is
also known as Shannon's diversity index, the Shannon–Wiener
index,[citation needed] the Shannon–Weaver index and the Shannon
entropy. The measure was originally proposed by Claude Shannon to
quantify the entropy (uncertainty or information content) in
strings of text.The idea is that the more different letters there
are, and the more equal their proportional abundances in the string
of interest, the more difficult it is to correctly predict which
letter will be the next one in the string. The Shannon entropy
quantifies the uncertainty (entropy or degree of surprise)
associated with this prediction.
Simpson indexThe Simpson index was introduced in
1949 by Edward H. Simpson to measure the degree of concentration
when individuals are classified into types. The same index was
rediscovered by Orris C. Herfindahl in 1950.The square root of the
index had already been introduced in 1945 by the economist Albert
O. Hirschman.[8] As a result, the same measure is usually known as
the Simpson index in ecology, and as the Herfindahl index or the
Herfindahl–Hirschman index (HHI) in economics.
The measure equals the probability that two entities taken at
random from the dataset of interest represent the same type.
更直观的反应微生物的多样性,还需要利用香农-威纳指数(Shannon-Wiener Index)和辛普森多样性指数(Simpson's
diversity 首先说明:多样性指数是反映丰富度和均匀度的综合指标。应指出的是,应用多样性指数时,具低丰富度和高均匀度的群落与具高丰富度与低均匀度的群落,可能得到相同的多样性指数。
Shannon-Wiener Index
费歇尔和普雷斯顿的方法所表示的多样性指数仅包括种的多寡一方面。香农-威纳指数和辛普森指数则包括了测量群落的异质性。香农-威纳指数借用了信息论方法。信息论的主要测量对象是系统的序(
order)或无序(disorder)的含量。在通讯工程中,人们要进行预测,预测信息中下一个是什么字母,其不定性的程度有多大。例如,b
b b b b b
b这样的信息流,都属于同一个字母,要预测下一个字母是什么,没有任何不定性,其信息的不定性含量等于零。如果是a,b,c,d,e,f,g,每个字母都不相同。那么其信息的不定性含量就大。在群落多样性的测度上,就借用了这个信息论中不定性测量方法,就是预测下一个采集的个体属于什么种,如果群落的多样性程度越高,其不定性也就越大。
香农-威纳指数的公式是:H=-∑(Pi)(log2Pi)
其中,H=样品的信息含量(彼得/个体)=群落的多样性指数,S=种数,Pi=样品中属于第i种的个体的比例,如样品总个体数为N,第i种个体数为ni,则Pi=ni/N
在香农-威纳指数中,包含着两个成分:①种数;②各种间个体分配的均匀性(equiability或evenness)。各种之间,个体分配越均匀,H值就越大。如果每一个体都属于不同的种,多样性指数就最大;如果每一个体都属于同一种,则其多样性指数就最小。那么,均匀性指数如何来测定呢?可以通过估计群落的理论上的最大多样性指数(Hmax),然后以实际的多样性指数对Hmax的比率,从而获得均匀性指数,具体步骤如下:
Hmax=-S(1/S
Simpson's
diversity
辛普森在1949年提出过这样的问题:在无限大小的群落中,随机取样得到同样的两个标本,它们的概率是什么呢?如在加拿大北部森林中,随机采取两株树标本,属同一个种的概率就很高。相反,如在热带雨林随机取样,两株树同一种的概率很低,他从这个想法出发得出多样性指数。用公式表示为:
辛普森多样性指数的最低值是0;
最高值为Dmax:
前一种情况出现在全部个体均属于一个种的时候,后一种情况出现在每个个体分别属于不同种的时候。
甲群落的辛普森指数:D甲=1-(0.992+0.012)=0.0198
甲群落的辛普森指数:D乙=1-(0.52+0.52)=0.5
乙群落的多样性高于甲群落。造成这两个群落多样性差异的主要原因是种的不均匀性,从丰富度来看,两个群落是一样的,但均匀度不同。
|
|
来自: Irene_2017 > 《待分类》