男,
一个长大了才会遇到的帅哥,
稳健,潇洒,大方,靠谱。
一段生信缘,一棵技能树。
生信技能树核心成员,单细胞天地特约撰稿人,简书创作者,单细胞数据科学家。
相关阅读 细胞配受体通识以及常见细胞分泌信号通路 CellChat:细胞间相互作用分析利器 CellChat:细胞间相互作用在线内容 Cell-Cell Interaction Database|| 单细胞配受体库你还在文章的附录里找吗? celltalker:单细胞转录组配受体分析 Network在单细胞转录组数据分析中的应用
正文 在单细胞数据分析过程中,上游的拆库定量,中游的分群注释,目前的门槛是很低的了。但是解析细胞类型异质性不应止于这些,还可以看细胞群之间的通讯。当然,这方面我们介绍过CellChat:细胞间相互作用分析利器。CellChat是以信号通路为单位来计算细胞间交流状态的,很多同学用cellphonedb来做基于配受体对的细胞间交流。今天,我们就来用经典的pbmc3k数据跑一下cellphonedb,并尝试可视化。
先放一张想象图:
来自文章:Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma
照例,请出我们的pbmc3k数据:
library(Seurat) library(SeuratData) pbmc3k An object of class Seurat 13714 features across 2700 samples within 1 assay Active assay: RNA (13714 features, 0 variable features) head(pbmc3k@meta.data) orig.ident nCount_RNA nFeature_RNA seurat_annotations AAACATACAACCAC pbmc3k 2419 779 Memory CD4 T AAACATTGAGCTAC pbmc3k 4903 1352 B AAACATTGATCAGC pbmc3k 3147 1129 Memory CD4 T AAACCGTGCTTCCG pbmc3k 2639 960 CD14+ Mono AAACCGTGTATGCG pbmc3k 980 521 NK AAACGCACTGGTAC pbmc3k 2163 781 Memory CD4 T
可以看出已经注释好了的,接下来我们写出跑cellphonedb的文件:表达谱和细胞分组文件。
write.table(as.matrix(pbmc3k@assays$RNA @data), 'cellphonedb_count.txt' , sep='\t' , quote=F) meta_data <- cbind(rownames(pbmc3k@meta.data), pbmc3k@meta.data[,'seurat_annotations' , drop=F]) meta_data <- as.matrix(meta_data) meta_dat[is.na(meta_data)] = "Unkown" # 细胞类型中不能有NA write.table(meta_data, 'cellphonedb_meta.txt' , sep='\t' , quote=F, row.names=F)
假设诸君已经安装好cellphonedb ,并加入环境变量了,我们可以这样跑:
cellphonedb method statistical_analysis cellphonedb_meta.txt cellphonedb_count.txt --counts-data=gene_name # 如果是直接写出基因名的加这个参数,转化为基因ID的话不用加。 #cellphonedb 自己的绘图 cellphonedb plot dot_plot cellphonedb plot heatmap_plot cellphonedb_meta.txt
cellphonedb 结果文件需要挨个打开看看,不要只看名字。更多细节可以看官网的介绍,记住,官网对你是开放的。
https://www./documentation
ree out/ out/ ├── count_network.txt # 绘制网络图文件 ├── deconvoluted.txt ├── heatmap_count.pdf ├── heatmap_log_count.pdf ├── interaction_count.txt ├── means.txt # 绘制点图文件 ├── plot.pdf ├── pvalues.txt # 绘制点图文件 └── significant_means.txt
下面在R中对结果进行可视化演练,主要是网络图和点图。点图我们知道ggplot的同学都不陌生了,就是网络图一看就让人头大,好在之前你们家运来小哥哥抄完了网络数据的统计分析:R语言实战一书,算是知道些皮毛,可以在这里用上啦。
pbmc='D:\\SingleCell\\out\\' ## outs 文件放在这里了。 library(psych) library(qgraph) library(igraph) netf<- "count_network.txt" mynet <- read.delim(paste0(pbmc,"count_network.txt" ), check.names = FALSE) head(mynet) SOURCE TARGET count 1 Memory CD4 T Memory CD4 T 3 2 Memory CD4 T B 10 3 Memory CD4 T CD14+ Mono 14 4 Memory CD4 T NK 15 5 Memory CD4 T CD8 T 6 6 Memory CD4 T Naive CD4 T 3 net<- graph_from_data_frame(mynet) plot(net)
读入网络数据,构建网络,并绘制最简单的一张网络图:
为了给这个网络图的边点mapping上不同的属性我们引入一串颜色。
allcolour=c("#DC143C" ,"#0000FF" ,"#20B2AA" ,"#FFA500" ,"#9370DB" , "#98FB98" ,"#F08080" ,"#1E90FF" ,"#7CFC00" ,"#FFFF00" , "#808000" ,"#FF00FF" ,"#FA8072" ,"#7B68EE" ,"#9400D3" , "#800080" ,"#A0522D" ,"#D2B48C" ,"#D2691E" ,"#87CEEB" , "#40E0D0" ,"#5F9EA0" ,"#FF1493" , "#FFE4B5" ,"#8A2BE2" ,"#228B22" ,"#E9967A" ,"#4682B4" , "#32CD32" ,"#F0E68C" ,"#FFFFE0" ,"#EE82EE" ,"#FF6347" , "#6A5ACD" ,"#9932CC" ,"#8B008B" ,"#8B4513" ,"#DEB887" ) karate_groups <- cluster_optimal(net) coords <- layout_in_circle(net, order = order(membership(karate_groups))) # 设置网络布局 E(net)$width <- E(net)$count /10 # 边点权重(粗细) plot(net, edge.arrow.size=.1, edge.curved=0, vertex.color=allcolour, vertex.frame.color="#555555" , vertex.label.color="black" , layout = coords, vertex.label.cex=.7)
经过以上一番修饰后,我们得到的网络图如下:
但是边的颜色和点的颜色还是对应不上,我们来修改一番边的属性。
net2 <- net # 复制一份备用 for (i in 1: length(unique(mynet$SOURCE )) ){ E(net)[map(unique(mynet$SOURCE ),function (x) { get.edge.ids(net,vp = c(unique(mynet$SOURCE )[i],x)) })%>% unlist()]$color <- allcolour[i] } # 这波操作谁有更好的解决方案? plot(net, edge.arrow.size=.1, edge.curved=0, vertex.color=allcolour, vertex.frame.color="#555555" , vertex.label.color="black" , layout = coords, vertex.label.cex=.7)
至此,文章开头的第一张图就复现出来了。我们观察到由于箭头是双向的,所以两个细胞类型之间的线条会被后来画上去的覆盖掉(开头那张网络图也有这个问题吗?),这里我们把线条做成曲线:
plot(net, edge.arrow.size=.1, edge.curved=0.2, # 只是调了这个参数 vertex.color=allcolour, vertex.frame.color="#555555" , vertex.label.color="black" , layout = coords, vertex.label.cex=.7)
这下清晰很多了。
接下来,我们来绘制第二组类型贝壳一样的网络图。
dev.off() length(unique(mynet$SOURCE )) # 查看需要绘制多少张图,以方便布局 par(mfrow=c(2,5), mar=c(.3,.3,.3,.3))for (i in 1: length(unique(mynet$SOURCE )) ){ net1<-net2 E(net1)[map(unique(mynet$SOURCE ),function (x) { get.edge.ids(net,vp = c(unique(mynet$SOURCE )[i],x)) })%>% unlist()]$color <- allcolour[i] plot(net1, edge.arrow.size=.1, edge.curved=0.4, vertex.color=allcolour, vertex.frame.color="#555555" , vertex.label.color="black" , layout = coords, vertex.label.cex=1) }
这样是不是清晰了许多呢?
但是,细胞间通讯的频数(count)并没有绘制在图上,略显不足,这就补上吧。
dev.off() length(unique(mynet$SOURCE )) par(mfrow=c(2,5), mar=c(.3,.3,.3,.3))for (i in 1: length(unique(mynet$SOURCE )) ){ net1<-net2 E(net1)$count <- "" E(net1)[map(unique(mynet$SOURCE ),function (x) { get.edge.ids(net,vp = c(unique(mynet$SOURCE )[i],x)) })%>% unlist()]$count <- E(net2)[map(unique(mynet$SOURCE ),function (x) { get.edge.ids(net,vp = c(unique(mynet$SOURCE )[i],x)) })%>% unlist()]$count # 故技重施 E(net1)[map(unique(mynet$SOURCE ),function (x) { get.edge.ids(net,vp = c(unique(mynet$SOURCE )[i],x)) })%>% unlist()]$color <- allcolour[i] plot(net1, edge.arrow.size=.1, edge.curved=0.4, edge.label = E(net1)$count , # 绘制边的权重 vertex.color=allcolour, vertex.frame.color="#555555" , vertex.label.color="black" , layout = coords, vertex.label.cex=1 ) }
需要再做一些调整就可以啦。
找两条边验证一下对应关系。
mynet %>% filter(SOURCE == 'B' ,TARGET == 'DC' ) SOURCE TARGET count 1 B DC 18 mynet %>% filter(SOURCE == 'Memory CD4 T' ,TARGET == 'B' ) SOURCE TARGET count 1 Memory CD4 T B 10
用ggplot2 绘制点图,关键是把两张表合并到一起。
mypvals <- read.delim(paste0(pbmc,"pvalues.txt" ), check.names = FALSE) mymeans <- read.delim(paste0(pbmc,"means.txt" ), check.names = FALSE)# 这些基因list很有意思啊,建议保存 chemokines <- grep("^CXC|CCL|CCR|CX3|XCL|XCR" , mymeans$interacting_pair ,value = T) chemokines <- grep("^CXC|CCL|CCR|CX3|XCL|XCR" , mymeans$interacting_pair ,value = T) th1 <- grep("IL2|IL12|IL18|IL27|IFNG|IL10|TNF$|TNF |LTA|LTB|STAT1|CCR5|CXCR3|IL12RB1|IFNGR1|TBX21|STAT4" , mymeans$interacting_pair ,value = T) th2 <- grep("IL4|IL5|IL25|IL10|IL13|AREG|STAT6|GATA3|IL4R" , mymeans$interacting_pair ,value = T) th17 <- grep("IL21|IL22|IL24|IL26|IL17A|IL17A|IL17F|IL17RA|IL10|RORC|RORA|STAT3|CCR4|CCR6|IL23RA|TGFB" , mymeans$interacting_pair ,value = T) treg <- grep("IL35|IL10|FOXP3|IL2RA|TGFB" , mymeans$interacting_pair ,value = T) costimulatory <- grep("CD86|CD80|CD48|LILRB2|LILRB4|TNF|CD2|ICAM|SLAM|LT[AB]|NECTIN2|CD40|CD70|CD27|CD28|CD58|TSLP|PVR|CD44|CD55|CD[1-9]" , mymeans$interacting_pair ,value = T) coinhibitory <- grep("SIRP|CD47|ICOS|TIGIT|CTLA4|PDCD1|CD274|LAG3|HAVCR|VSIR" , mymeans$interacting_pair ,value = T) niche <- grep("CSF" , mymeans$interacting_pair ,value = T) mymeans %>% dplyr::filter(interacting_pair %in % costimulatory)%>% dplyr::select("interacting_pair" ,starts_with("NK" ),ends_with("NK" )) %>% reshape2::melt() -> meansdf colnames(meansdf)<- c("interacting_pair" ,"CC" ,"means" ) mypvals %>% dplyr::filter(interacting_pair %in % costimulatory)%>% dplyr::select("interacting_pair" ,starts_with("NK" ),ends_with("NK" ))%>% reshape2::melt()-> pvalsdf colnames(pvalsdf)<- c("interacting_pair" ,"CC" ,"pvals" ) pvalsdf$joinlab <- paste0(pvalsdf$interacting_pair ,"_" ,pvalsdf$CC ) meansdf$joinlab <- paste0(meansdf$interacting_pair ,"_" ,meansdf$CC ) pldf <- merge(pvalsdf,meansdf,by = "joinlab" ) summary((filter(pldf,means >1))$means ) pldf%>% filter(means >1) %>% ggplot(aes(CC.x,interacting_pair.x) )+ geom_point(aes(color=means,size=-log10(pvals+0.0001)) ) + scale_size_continuous(range = c(1,3))+ scale_color_gradient2(high="red" ,mid = "yellow" ,low ="darkblue" ,midpoint = 25 )+ theme_bw()+ theme(axis.text.x = element_text(angle = -45,hjust = -0.1,vjust = 0.8))
这部分写的略显臃肿,勉强可用吧。
番外篇:在不能用--counts-data=gene_name
时,我们可以如何转化基因ID呢。这里提供一种思路:用Seurat读两次数据,一次是基因SYMBOL,一次是ENTREZID(cellranger提供的便利)。方法是从10X的输出中直接读取带有ENTREZID的矩阵,如下从这个矩阵中匹配出目标矩阵。
读入数据的时候gene.column = 1
默认为2(SYMBOL)。
Read10X( data.dir = NULL, gene.column = 1, unique.features = TRUE, strip.suffix = FALSE )
seurat2cellphonedb<- function (seosym,seo,groups,group1,celltype){ Idents(seo) <- groups myseogroup <- subset(seo,idents =group1) myseogroup1 <- subset(seosym,cells=colnames(myseogroup )) count_raw <- myseogroup1@assays$RNA @counts count_norm <- apply(count_raw, 2, function (x) (x/sum(x))*10000) write.table(count_norm, paste0(group1,"_cellphonedb_count.txt" ), sep='\t' , quote=F) meta_data <- cbind(rownames(myseogroup@meta.data),myseogroup@meta.data[,celltype, drop=F]) write.table(meta_data, paste0(group1,'_cellphonedb_meta.txt' ), sep='\t' , quote=F, row.names=F) }$cellphonedb method statistical_analysis cseocellphonedb_meta.txt C_cellphonedb_count.txt$cellphonedb plot dot_plot$cellphonedb plot heatmap_plot yourmeta.txt
References [1]
细胞配受体通识以及常见细胞分泌信号通路: https://www.jianshu.com/p/df4721d29a91 [2]
CellChat:细胞间相互作用分析利器: https://www.jianshu.com/p/da145cff3d41 [3]
CellChat:细胞间相互作用在线内容: https://www.jianshu.com/p/e5ff8140f388 [4]
Cell-Cell Interaction Database|| 单细胞配受体库你还在文章的附录里找吗?: https://www.jianshu.com/p/49613adce465 [5]
celltalker:单细胞转录组配受体分析: https://www.jianshu.com/p/2c2f94b4f072 [6]
Network在单细胞转录组数据分析中的应用: https://www.jianshu.com/p/23e46ad14a3e [7]
CellChat:细胞间相互作用分析利器: https://www.jianshu.com/p/da145cff3d41 [8]
Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma:https://www./articles/s41422-020-0374-x#Abs1 [9]
https://www./documentation [10]
网络数据的统计分析:R语言实战: https://www.jianshu.com/nb/47898774 [11]
https://www./faq-and-troubleshooting [12]
https://github.com/Teichlab/cellphonedb/blob/master/cellphonedb/src/plotters/R/plot_heatmaps.R [13]
https://github.com/zktuong/ktplots/
看完记得顺手点个 “在看” 哦!