【原】cellphonedb 及其可视化

健明 2021-07-15

展开全文

男，

一个长大了才会遇到的帅哥，

稳健，潇洒，大方，靠谱。

一段生信缘，一棵技能树。

生信技能树核心成员，单细胞天地特约撰稿人，简书创作者，单细胞数据科学家。

正文

在单细胞数据分析过程中，上游的拆库定量，中游的分群注释，目前的门槛是很低的了。但是解析细胞类型异质性不应止于这些，还可以看细胞群之间的通讯。当然，这方面我们介绍过CellChat：细胞间相互作用分析利器。CellChat是以信号通路为单位来计算细胞间交流状态的，很多同学用cellphonedb来做基于配受体对的细胞间交流。今天，我们就来用经典的pbmc3k数据跑一下cellphonedb，并尝试可视化。

先放一张想象图：

来自文章：Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma

照例，请出我们的pbmc3k数据：

library(Seurat)
library(SeuratData)
pbmc3k

An object of class Seurat 
13714 features across 2700 samples within 1 assay 
Active assay: RNA (13714 features, 0 variable features)

head(pbmc3k@meta.data)
               orig.ident nCount_RNA nFeature_RNA seurat_annotations
AAACATACAACCAC     pbmc3k       2419          779       Memory CD4 T
AAACATTGAGCTAC     pbmc3k       4903         1352                  B
AAACATTGATCAGC     pbmc3k       3147         1129       Memory CD4 T
AAACCGTGCTTCCG     pbmc3k       2639          960         CD14+ Mono
AAACCGTGTATGCG     pbmc3k        980          521                 NK
AAACGCACTGGTAC     pbmc3k       2163          781       Memory CD4 T

可以看出已经注释好了的，接下来我们写出跑cellphonedb的文件：表达谱和细胞分组文件。

 write.table(as.matrix(pbmc3k@assays$RNA@data), 'cellphonedb_count.txt', sep='\t', quote=F)
 meta_data <- cbind(rownames(pbmc3k@meta.data), pbmc3k@meta.data[,'seurat_annotations', drop=F])  
meta_data <- as.matrix(meta_data)
  meta_dat[is.na(meta_data)] = "Unkown" #  细胞类型中不能有NA

 write.table(meta_data, 'cellphonedb_meta.txt', sep='\t', quote=F, row.names=F)

假设诸君已经安装好cellphonedb ，并加入环境变量了，我们可以这样跑：

cellphonedb method statistical_analysis  cellphonedb_meta.txt  cellphonedb_count.txt      --counts-data=gene_name  # 如果是直接写出基因名的加这个参数，转化为基因ID的话不用加。

#cellphonedb 自己的绘图
cellphonedb plot dot_plot 
cellphonedb plot heatmap_plot cellphonedb_meta.txt

cellphonedb 结果文件需要挨个打开看看，不要只看名字。更多细节可以看官网的介绍，记住，官网对你是开放的。

https://www./documentation

ree out/
out/
├── count_network.txt  # 绘制网络图文件
├── deconvoluted.txt
├── heatmap_count.pdf
├── heatmap_log_count.pdf
├── interaction_count.txt
├── means.txt # 绘制点图文件
├── plot.pdf
├── pvalues.txt  # 绘制点图文件 
└── significant_means.txt

下面在R中对结果进行可视化演练，主要是网络图和点图。点图我们知道ggplot的同学都不陌生了，就是网络图一看就让人头大，好在之前你们家运来小哥哥抄完了网络数据的统计分析：R语言实战一书，算是知道些皮毛，可以在这里用上啦。

pbmc='D:\\SingleCell\\out\\' ##  outs 文件放在这里了。

library(psych)
library(qgraph)
library(igraph)

netf<- "count_network.txt"
mynet <- read.delim(paste0(pbmc,"count_network.txt"), check.names = FALSE)
head(mynet)
        SOURCE       TARGET count
1 Memory CD4 T Memory CD4 T     3
2 Memory CD4 T            B    10
3 Memory CD4 T   CD14+ Mono    14
4 Memory CD4 T           NK    15
5 Memory CD4 T        CD8 T     6
6 Memory CD4 T  Naive CD4 T     3

net<- graph_from_data_frame(mynet)

plot(net)

读入网络数据，构建网络，并绘制最简单的一张网络图：

为了给这个网络图的边点mapping上不同的属性我们引入一串颜色。

allcolour=c("#DC143C","#0000FF","#20B2AA","#FFA500","#9370DB",
            "#98FB98","#F08080","#1E90FF","#7CFC00","#FFFF00",
            "#808000","#FF00FF","#FA8072","#7B68EE","#9400D3",
            "#800080","#A0522D","#D2B48C","#D2691E","#87CEEB",
            "#40E0D0","#5F9EA0","#FF1493",
            "#FFE4B5","#8A2BE2","#228B22","#E9967A","#4682B4",
            "#32CD32","#F0E68C","#FFFFE0","#EE82EE","#FF6347",
            "#6A5ACD","#9932CC","#8B008B","#8B4513","#DEB887")

karate_groups <- cluster_optimal(net)
coords <- layout_in_circle(net, order =
                             order(membership(karate_groups)))  # 设置网络布局

E(net)$width  <- E(net)$count/10  # 边点权重（粗细）
plot(net, edge.arrow.size=.1, 
     edge.curved=0,
     vertex.color=allcolour,
     vertex.frame.color="#555555",
     vertex.label.color="black",
     layout = coords,
     vertex.label.cex=.7)

经过以上一番修饰后，我们得到的网络图如下：

但是边的颜色和点的颜色还是对应不上，我们来修改一番边的属性。

net2 <- net  # 复制一份备用

for (i in 1: length(unique(mynet$SOURCE)) ){
  E(net)[map(unique(mynet$SOURCE),function(x) {
    get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
  })%>% unlist()]$color <- allcolour[i]
}  # 这波操作谁有更好的解决方案？ 

plot(net, edge.arrow.size=.1, 
     edge.curved=0,
     vertex.color=allcolour,
     vertex.frame.color="#555555",
     vertex.label.color="black",
     layout = coords,
     vertex.label.cex=.7)

至此，文章开头的第一张图就复现出来了。我们观察到由于箭头是双向的，所以两个细胞类型之间的线条会被后来画上去的覆盖掉（开头那张网络图也有这个问题吗？），这里我们把线条做成曲线：

plot(net, edge.arrow.size=.1, 
     edge.curved=0.2, # 只是调了这个参数
     vertex.color=allcolour,
     vertex.frame.color="#555555",
     vertex.label.color="black",
     layout = coords,
     vertex.label.cex=.7)

这下清晰很多了。

接下来，我们来绘制第二组类型贝壳一样的网络图。

dev.off()
length(unique(mynet$SOURCE)) # 查看需要绘制多少张图，以方便布局
par(mfrow=c(2,5), mar=c(.3,.3,.3,.3))

for (i in 1: length(unique(mynet$SOURCE)) ){
  net1<-net2
  E(net1)[map(unique(mynet$SOURCE),function(x) {
    get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
  })%>% unlist()]$color <- allcolour[i]

  plot(net1, edge.arrow.size=.1, 
       edge.curved=0.4,
       vertex.color=allcolour,
       vertex.frame.color="#555555",
       vertex.label.color="black",
       layout = coords,
       vertex.label.cex=1) 

}

这样是不是清晰了许多呢？

但是，细胞间通讯的频数（count）并没有绘制在图上，略显不足，这就补上吧。

dev.off()
length(unique(mynet$SOURCE))
par(mfrow=c(2,5), mar=c(.3,.3,.3,.3))

for (i in 1: length(unique(mynet$SOURCE)) ){
  net1<-net2

  E(net1)$count <- ""
  E(net1)[map(unique(mynet$SOURCE),function(x) {
    get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
  })%>% unlist()]$count  <- E(net2)[map(unique(mynet$SOURCE),function(x) {
    get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
  })%>% unlist()]$count  # 故技重施

  E(net1)[map(unique(mynet$SOURCE),function(x) {
    get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
  })%>% unlist()]$color <- allcolour[i]

  plot(net1, edge.arrow.size=.1, 
       edge.curved=0.4,
       edge.label = E(net1)$count, # 绘制边的权重
       vertex.color=allcolour,
       vertex.frame.color="#555555",
       vertex.label.color="black",
       layout = coords,
       vertex.label.cex=1
  ) 

}

需要再做一些调整就可以啦。

找两条边验证一下对应关系。

mynet %>% filter(SOURCE  == 'B',TARGET == 'DC')
  SOURCE TARGET count
1      B     DC    18

 mynet %>% filter(SOURCE  == 'Memory CD4 T',TARGET == 'B')
        SOURCE TARGET count
1 Memory CD4 T      B    10

用ggplot2 绘制点图，关键是把两张表合并到一起。

mypvals <- read.delim(paste0(pbmc,"pvalues.txt"), check.names = FALSE)
mymeans <- read.delim(paste0(pbmc,"means.txt"), check.names = FALSE)

# 这些基因list很有意思啊，建议保存
chemokines <- grep("^CXC|CCL|CCR|CX3|XCL|XCR", mymeans$interacting_pair,value = T)
chemokines <- grep("^CXC|CCL|CCR|CX3|XCL|XCR", mymeans$interacting_pair,value = T)
th1 <- grep("IL2|IL12|IL18|IL27|IFNG|IL10|TNF$|TNF |LTA|LTB|STAT1|CCR5|CXCR3|IL12RB1|IFNGR1|TBX21|STAT4", 
            mymeans$interacting_pair,value = T)
th2 <- grep("IL4|IL5|IL25|IL10|IL13|AREG|STAT6|GATA3|IL4R", 
            mymeans$interacting_pair,value = T)
th17 <- grep("IL21|IL22|IL24|IL26|IL17A|IL17A|IL17F|IL17RA|IL10|RORC|RORA|STAT3|CCR4|CCR6|IL23RA|TGFB", 
             mymeans$interacting_pair,value = T)
treg <- grep("IL35|IL10|FOXP3|IL2RA|TGFB", mymeans$interacting_pair,value = T)
costimulatory <- grep("CD86|CD80|CD48|LILRB2|LILRB4|TNF|CD2|ICAM|SLAM|LT[AB]|NECTIN2|CD40|CD70|CD27|CD28|CD58|TSLP|PVR|CD44|CD55|CD[1-9]", 
                      mymeans$interacting_pair,value = T)
coinhibitory <- grep("SIRP|CD47|ICOS|TIGIT|CTLA4|PDCD1|CD274|LAG3|HAVCR|VSIR", 
                     mymeans$interacting_pair,value = T)
niche <- grep("CSF", mymeans$interacting_pair,value = T)

mymeans %>% dplyr::filter(interacting_pair %in% costimulatory)%>%
  dplyr::select("interacting_pair",starts_with("NK"),ends_with("NK"))  %>%  
  reshape2::melt() -> meansdf

colnames(meansdf)<- c("interacting_pair","CC","means")

mypvals %>% dplyr::filter(interacting_pair %in% costimulatory)%>%
  dplyr::select("interacting_pair",starts_with("NK"),ends_with("NK"))%>%  
  reshape2::melt()-> pvalsdf

colnames(pvalsdf)<- c("interacting_pair","CC","pvals")
pvalsdf$joinlab<- paste0(pvalsdf$interacting_pair,"_",pvalsdf$CC)
meansdf$joinlab<- paste0(meansdf$interacting_pair,"_",meansdf$CC)
pldf <- merge(pvalsdf,meansdf,by = "joinlab")

summary((filter(pldf,means >1))$means)

pldf%>% filter(means >1) %>% 
  ggplot(aes(CC.x,interacting_pair.x) )+ 
  geom_point(aes(color=means,size=-log10(pvals+0.0001)) ) +
  scale_size_continuous(range = c(1,3))+
  scale_color_gradient2(high="red",mid = "yellow",low ="darkblue",midpoint = 25  )+ theme_bw()+ 
  theme(axis.text.x = element_text(angle = -45,hjust = -0.1,vjust = 0.8))

这部分写的略显臃肿，勉强可用吧。

番外篇：在不能用--counts-data=gene_name时，我们可以如何转化基因ID呢。这里提供一种思路：用Seurat读两次数据，一次是基因SYMBOL，一次是ENTREZID（cellranger提供的便利）。方法是从10X的输出中直接读取带有ENTREZID的矩阵，如下从这个矩阵中匹配出目标矩阵。

读入数据的时候gene.column = 1默认为2（SYMBOL）。

Read10X(
  data.dir = NULL,
  gene.column = 1,
  unique.features = TRUE,
  strip.suffix = FALSE
)

seurat2cellphonedb<- function(seosym,seo,groups,group1,celltype){
  Idents(seo) <- groups
  myseogroup <- subset(seo,idents =group1)
  myseogroup1 <-  subset(seosym,cells=colnames(myseogroup ))
  count_raw <- myseogroup1@assays$RNA@counts
  count_norm <- apply(count_raw, 2, function(x) (x/sum(x))*10000)
  write.table(count_norm, paste0(group1,"_cellphonedb_count.txt"), sep='\t', quote=F)
  meta_data <- cbind(rownames(myseogroup@meta.data),myseogroup@meta.data[,celltype, drop=F])
  write.table(meta_data, paste0(group1,'_cellphonedb_meta.txt'), sep='\t', quote=F, row.names=F)
}

$cellphonedb method statistical_analysis   cseocellphonedb_meta.txt  C_cellphonedb_count.txt

$cellphonedb plot dot_plot
$cellphonedb plot heatmap_plot yourmeta.txt

References

[1] 细胞配受体通识以及常见细胞分泌信号通路: https://www.jianshu.com/p/df4721d29a91
[2] CellChat：细胞间相互作用分析利器: https://www.jianshu.com/p/da145cff3d41
[3] CellChat：细胞间相互作用在线内容: https://www.jianshu.com/p/e5ff8140f388
[4] Cell-Cell Interaction Database|| 单细胞配受体库你还在文章的附录里找吗？: https://www.jianshu.com/p/49613adce465
[5] celltalker：单细胞转录组配受体分析: https://www.jianshu.com/p/2c2f94b4f072
[6] Network在单细胞转录组数据分析中的应用: https://www.jianshu.com/p/23e46ad14a3e
[7] CellChat：细胞间相互作用分析利器: https://www.jianshu.com/p/da145cff3d41
[8] Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma:https://www./articles/s41422-020-0374-x#Abs1
[9] https://www./documentation
[10] 网络数据的统计分析：R语言实战: https://www.jianshu.com/nb/47898774
[11] https://www./faq-and-troubleshooting
[12] https://github.com/Teichlab/cellphonedb/blob/master/cellphonedb/src/plotters/R/plot_heatmaps.R
[13] https://github.com/zktuong/ktplots/