这里我们推荐一个刚刚发表的文献,标题是:《scGENA: A Single-Cell Gene Coexpression Network Analysis Framework for Clustering Cell Types and Revealing Biological Mechanisms》,这个流程相关的代码在 GitHub (https://github.com/zpliulab/scGENA) ,并不是一个R包哦。可以看到主要是5个步骤:
Phase 1 set up and preprocesses the scRNA-seq dataset to filter low-dimensional and noisy single-cell expression genes.
Phase 2 performs a differentially expressed (DE) genes analysis to determine which genes are expressed significantly dif- ferent in different conditions. These genes can reveal biological information about the processes that are influenced by the conditions of interest.
Phase 3 applies the SAVER imputation method to estimate and replace dropout values in each gene cross cell’s actual missing expression level, reducing technical differences while preserving biological vari- ability across cells [23].
Phase 4 constructs a coexpression network analysis to shed light on the transcriptional regulatory mechanisms underpinning numerous biological processes [24].
Phase 5 performs further analyses, including a functional enrichment analysis, differential coexpression network analysis, and overlapping genes identification across different cell-types to better interpret the biological insights.
可以看到,其实就是最开始调用seurat里面的质量控制步骤去除低质量的细胞和不重要的基因,然后根据细胞的生物学来源分组后做差异分析,仍然是seurat里面的函数即可,依赖于MAST (model-based analysis of single-cell transcriptomics) 包,接着就是对有统计学意义的差异基因进行插值(因为单细胞技术限制很多基因被错误的0值化了,插值相当于纠正那些0值),这里作者选择了 SAVER method in this pipeline because it imputed original zero values to actual values ,表达量矩阵到这里,就基本上很少0值了,这3个步骤的效果如下所示:
We thank Dr.Jianming Zeng(University of Macau), and all the members of his bioinformatics team, biotrainee, for generously sharing their experience and codes.