文献阅读1
文献题目: Experimental validation of methods for differential gene expression analysis and sample pooling in RNA-seq 文献来源:BMC genomic-2015 文献摘要(译):
背景: 大规模平行cDNA测序(RNA-seq)实验在基因表达定量分析上,逐步取代了芯片技术。但是,许多生物学家对于差异基因分(DEG)的方法和在RNA-seq实验中采用省钱的样本混池策略的可靠性存在疑惑。因此,我们在RNA-seq实验中对Cuffdiff2, edgeR,DESeq2和Two-stage Poisson Model(TSPM)鉴定到的DEGs,在老鼠扁桃腺进行微穿孔,使用高通量qRCR对独立生物学重复样本进行验证。另外,我们对RNA混池样本测序,并将其结果与相应独立测序样本比较。
结果: Cuffdiff2 的假阳性率和 DESeq2与TSPM的假阴性率很高。在四种调查的DEG分析方法中,edgeR的灵敏度和准确度相对较高。我们记录了混池的偏见,并且混池样本鉴定到的DEG具有很低的阳性预测值。
结论: 我们的结果表明组合使用灵敏度更高的DEG分析方法,及在未来RNA-seq实验中对鉴定到的DEGs进行高通量验证是必须的。这些结果表明对于RNA-seq实验在相似的设置上需要限制利用混池策略,并且增加样本的生物学重复。
和之前研究一致的发现: (1).DESeq具有低灵敏度 (2).Cuffdiff具有高的假阳性 (3).edgeR具有高的灵敏度 (4).TSPM的假阳性率和假阴性率依赖于重复的数量 所以目前大家普遍推荐使用edgeR和DESeq2,cuffdiff不建议使用了
DEG分析方法的差异: 方法 | edgeR | DESeq2 | Cuffdiff2 | TSPM | 标准化 | a model, which incorporates normalisation factors as offsets that are estimated by trimmed mean of M values for eachcontig | a relative log expression method | consider total number of reads, gene length, variability within and between the conditions, and differential isoform expression | accommodate various normalisation procedures, but works without normalisation by default | 分布 | Poisson distribution | negative binomial distribution | negative binomial distribution |
| 分布预测 | edgeR moderates its dispersion estimates by their dispersion-mean relationship | DESeq2 is stringent to detect outliers and excludes genes with extreme read counts by default.It considers the maximum a posteriori dispersion estimates | Cuffdiff2 includes covariances between different isoforms | TSPM differs by its per-gene dispersion estimation without considering the information across genes . | 计算p values | generalized linear model (GLM) likelihood ratio test | generalized linear model (GLM) likelihood ratio test | generalized linear model (GLM) likelihood ratio test | employs quasi or standard likelihood ratio tests, based on whether a gene is over-dispersed or not |
这些方法的主要差别在于分布预测过程不同
从A,C图都可以看出,混池的结果相较于对应的独立样本,鉴定到的DEGs数量显著偏多。因为混池相当于求平均值,会丢失异常值信息以及组内差异大小信息。所以混池建库测序会低估组内变异,导致很多低阳性预测值的DEGs被鉴定到。 从A,C图的比较可以看出,8个混池样本的鉴定到的DEGs(18055)少于3个混池样本鉴定到的DEGs(15745);对于独立样本,情况也是如此(82 vs 16),所以增加生物学重复可以缩小混池对于预测差异表达基因的偏差。 B,D图的比较也可以说明增加生物学重复可以增加对于群体变异预测的能力,并且降低混池偏差和假阳性率。
RNA-seq分析 RNA质量检测:
NanoDrop 1000: 264 ngRNA/sample Agilent 2100 Bioanalyzer: RNA Integrity Number (RIN) 7.53(SD 0.31)
Total RNA Small RNA miRNA (10-40nt)占small RNA(10-150nt)的比例 RIN >=7,good; RIN between 6 and 7,sometimes can also get good results,if the samples are extremely precious,worth try 28S/18S > 0.7 Fluorescent unit >1
资料链接: http://www.docin.com/p-769106334.html
DEG分析流程 Quality control > Aliment to mouse genome (TopHat 2.0.6)>
Aligned reads count (HTSeq 0.54) > DEG analysis (edgeR 3.2.4 Cuffdiff 2.1.1,DESeq 2 1.0.19 and TSPM)
adjusted p values less than 0.05 were considered as DEGs (BenjaminiHochberg false discovery correction)
通过qPCR验证DEGs的标准 对于在RNA-seq分析中鉴定到的DGE,如果满足以下标准,则被视为真阳性DEG: 1.RNA-seq 和 qPCR都显示相同的差异表达方向(上调或者下调) 2.由qPCR预测得到的差异表达倍数改变要么高于1.25倍,要么低于 0.8(LCF 界限为±0.3219) 3.Spearman相关系数,均方根偏差,kappa统计量使用STATA 13.1计算得到
原文链接:
https://www.ncbi.nlm./pmc/articles/PMC4515013/
|