【原】简单直接的拟时序分析方法，R包SCORPIUS推荐

健明 2022-02-03

展开全文

我看到了去年《单细胞天地》公众号的一个文献解读：单细胞图谱揭示发育中的小鼠大脑中的脑膜白细胞异质性，提到了一个有意思的拟时序方法，这篇文章没有用monocle2/3、PAGA等，而是选用了SCORPIUS，现在我们就来介绍一下它：

能做拟时序分析的方法实在是太多了，这个SCORPIUS方法也是2016丢在预印本的，文章标题是：《SCORPIUS improves trajectory inference and identifies novel modules in dendritic cell development》，有意思的是它直到现在也没有在任何期刊上面正式发表，但是都有了54个引用（截止至2022-02-02），预印本链接是：https://www./content/10.1101/079509v2

在其文章里面对流程的介绍蛮清晰的：

主要是3个步骤：

Dimensionality reduction involves calculating the correlation distance, optionally filtering out outliers, and performing multi-dimensional scaling.
Trajectory inference creates an initial path by calculating the shortest path through k cluster centres, and by iteratively fitting this path to the data using the principal curves algorithm.
During feature selection, a Random Forest is trained using the expression data to predict the ordering of cells as outputted by the principal curves

核心是使用机器学习里面的随机森林方法来挑选最重要的基因。而且作者在最经典的血液细胞发育领域测试了他们的工具：

可以看到不同的造血干/祖细胞（Hematopoietic stem/progenitor cells，HSPCs）发育成为了不同的血液细胞体系，包括、Erythroid/Megakaryocyte Progenitor(红系/巨核)、Myeloid(髓系)和Lymphoid(淋巴系)

敢于挑战难题，真棒！

当然了，看不懂原理也不会影响大家使用它，其GitHub有比较好的安装和使用介绍：https://github.com/rcannood/SCORPIUS

代码比较简单，测试代码如下所示：

# 如果还没有安装，就使用下面的代码安装它 
# install.packages("SCORPIUS")
library(SCORPIUS)
data(ginhoux)
ginhoux

expression <- ginhoux$expression
group_name <- ginhoux$sample_info$group_name
table(group_name)
dim(expression)
expression[1:4,1:4]
#

可以看到示例数据其实就 245个细胞，就2000个基因，然后细胞分成了3组

> table(group_name)
group_name
  MDP   CDP PreDC 
   57    94    94 
> dim(expression)
[1]  245 2000
> expression[1:4,1:4]
                 Mpo  DQ688647      Ly6d   Snora31
SRR1558744 1.6168843  9.041883  1.616884 10.587718
SRR1558745 8.6334859 11.435542  0.000000  7.931509
SRR1558746 0.0000000 11.056615  0.000000  0.000000
SRR1558747 0.9744123 11.962396 12.902384  0.000000

继续运行：


space <- reduce_dimensionality(expression, "spearman")
draw_trajectory_plot(space, group_name, contour = TRUE)
space <- reduce_dimensionality(expression, "spearman")
draw_trajectory_plot(space, group_name, contour = TRUE)
traj <- infer_trajectory(space)
draw_trajectory_plot(space, group_name, traj$path, contour = TRUE)

# warning: setting num_permutations to 10 requires a long time (~30min) to run!
# set it to 0 and define a manual cutoff for the genes (e.g. top 200) for a much shorter execution time.
gimp <- gene_importances(
  expression, 
  traj$time, 
  num_permutations = 10, 
  num_threads = 8, 
  ntree = 10000,
  ntree_perm = 1000
) 
gimp$qvalue <- p.adjust(gimp$pvalue, "BH", length(gimp$pvalue))
gene_sel <- gimp$gene[gimp$qvalue < .05]
expr_sel <- scale_quantile(expression[,gene_sel])

# Draw a time series heatmap
time <- traj$time
draw_trajectory_heatmap(expr_sel, time)

## Also show the progression groupings
draw_trajectory_heatmap(expr_sel, time, 
                        progression_group=group_name)

## 换一个配色
table(group_name)
draw_trajectory_heatmap(
  expr_sel, time, progression_group=group_name,
  progression_group_palette = setNames(RColorBrewer::brewer.pal(3, "Set2"),
                                      levels(group_name))
)

## 基因也可以分组
modules <- extract_modules(scale_quantile(expr_sel))
draw_trajectory_heatmap(expr_sel, time,
                        progression_group=group_name, modules=modules)

可以看到代码很简单，结果也很直接，真的是超级好用啊！！！有需要的小伙伴可以尝试下。

我们现在把SCORPIUS跟前面的教程：拟时序分析就是差异分析的细节剖析提到的monocle2对比一下，全部的代码如下所示：

# 如果还没有安装，就使用下面的代码安装它 
# install.packages("SCORPIUS")
rm(list = ls())
library(SCORPIUS)
library(SeuratData) #加载seurat数据集  
getOption('timeout')
options(timeout=10000)
#InstallData("pbmc3k")  
data("pbmc3k")  
sce <- pbmc3k.final  
library(Seurat)
table(Idents(sce))

sce$celltype = Idents(sce)
sce= sce[,sce$celltype  %in% c('CD14+ Mono','FCGR3A+ Mono')]
table(Idents(sce))
sce
pbmc = CreateSeuratObject(
  counts = sce@assays$RNA@counts,
  meta.data = sce@meta.data
)
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", 
                      scale.factor = 10000) 
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000) 
pbmc <- ScaleData(pbmc )  

expression <- t(as.matrix(pbmc@assays$RNA@scale.data))
group_name =  as.factor(as.character(pbmc$celltype))
table(group_name)
dim(expression)
expression[1:4,1:4]

space <- reduce_dimensionality(expression, "spearman")
draw_trajectory_plot(space, group_name, contour = TRUE,)
space <- reduce_dimensionality(expression, "spearman")
draw_trajectory_plot(space, group_name, contour = TRUE)
traj <- infer_trajectory(space)
draw_trajectory_plot(space, group_name, traj$path, contour = TRUE)

# warning: setting num_permutations to 10 requires a long time (~30min) to run!
# set it to 0 and define a manual cutoff for the genes (e.g. top 200) for a much shorter execution time.
gimp <- gene_importances(
  expression, 
  traj$time, 
  num_permutations = 10, 
  num_threads = 8, 
  ntree = 10000,
  ntree_perm = 1000
) 
gimp$qvalue <- p.adjust(gimp$pvalue, "BH", length(gimp$pvalue))
gene_sel <- gimp$gene[gimp$qvalue < .05]
expr_sel <- scale_quantile(expression[,gene_sel])

# Draw a time series heatmap
time <- traj$time
draw_trajectory_heatmap(expr_sel, time)

## Also show the progression groupings
draw_trajectory_heatmap(expr_sel, time, 
                        progression_group=group_name)

## 换一个配色
table(group_name)
draw_trajectory_heatmap(
  expr_sel, time, progression_group=group_name,
  progression_group_palette = setNames(RColorBrewer::brewer.pal(3, "Set2"),
                                      levels(group_name))
)

## 基因也可以分组
modules <- extract_modules(scale_quantile(expr_sel))
draw_trajectory_heatmap(expr_sel, time,
                        progression_group=group_name, modules=modules)

效果也很好：

有一个配色的小bug，天色已晚，让读者们自己去修改吧！

写在文末

我在《生信技能树》，《生信菜鸟团》，《单细胞天地》的大量推文教程里面共享的代码都是复制粘贴即可使用的，有任何疑问欢迎留言讨论，也可以发邮件给我，详细描述你遇到的困难的前因后果给我，我的邮箱地址是 jmzeng1314@163.com

如果你确实觉得我的教程对你的科研课题有帮助，让你茅塞顿开，或者说你的课题大量使用我的技能，烦请日后在发表自己的成果的时候，加上一个简短的致谢，如下所示：

We thank Dr.Jianming Zeng(University of Macau), and all the members of his bioinformatics team, biotrainee, for generously sharing their experience and codes.

十年后我环游世界各地的高校以及科研院所（当然包括中国大陆）的时候，如果有这样的情谊，我会优先见你。