本文简单介绍下在只有空转数据时候的几种常见spot注释方式(附带文献参考): (1)使用类bulk的解卷积方式注释spot的细胞类型; (2)基于基因集打分的方式注释spot的细胞类型 (3)基于cluster的marker gene ,结合生物学背景注释spot细胞类型。 使用上篇推文中T0的数据,首先快速进行空间转录组的标准分析 library(Seurat) library(data.table) library(tidyverse) library(MCPcounter) source("Load10X_Spatial_change.R")
ST_object <- Load10X_Spatial_change(data.dir = ".\\outs\\", filename = "filtered_feature_bc_matrix.h5") for (i in colnames((ST_object@images$slice1@coordinates))) { ST_object@images$slice1@coordinates[[i]] <- as.integer(ST_object@images$slice1@coordinates[[i]]) } SpatialDimPlot(ST_object,alpha = 0) #标准分析 ST_object <- SCTransform(ST_object, assay = "Spatial", verbose = FALSE) ST_object <- RunPCA(ST_object, assay = "SCT", verbose = FALSE) ST_object <- FindNeighbors(ST_object, reduction = "pca", dims = 1:30) ST_object <- FindClusters(ST_object, verbose = FALSE) ST_object <- RunUMAP(ST_object, reduction = "pca", dims = 1:30 )
head(ST_object)
得到了聚类分群结果后,就可以进行空间细胞类型注释了。 使用bulk数据中的常见的免疫浸润解卷积分析软件得到每个spot的细胞类型占比,这里以MCP counter [1]和 Xcell为示例,其他bulk解卷积软件自行发挥。 2.1 MCPcounter 使用MCPcounter包主函数MCPcounter.estimate进行注释,参考文献[1] 。 其中featuresType 默认为"affy133P2_probesets" ,可选 "HUGO_symbols" (Official gene symbols), "ENTREZ_ID" (Entrez Gene ID) 或者"ENSEMBL_ID" (ENSEMBL Gene ID) #compute mcp scores计算MCP的score,使用SCT后的数据 mcp_scores <- MCPcounter.estimate(ST_object[["SCT"]]@data,featuresType = "HUGO_symbols") rownames(mcp_scores) <- make.names(rownames(mcp_scores))#把MCPcounter的type名中间的空格去掉 cell_types <- rownames(mcp_scores) #就是各种T Bcell for(cell_type in cell_types){ #for循环写入meta信息 ST_object <- AddMetaData(object= ST_object,metadata = mcp_scores[cell_type,],col.name = cell_type) } head(ST_object)
使用AddMetaData 函数循环添加meta信息,后面经常会用到,建议熟练掌握。 2.2 Xcell 更多详见RNAseq|免疫浸润也杀疯了,cibersoert?xCELL?ESTIMATE?你常用哪一个 library(xCell) xCell = xCellAnalysis(ST_object[["SCT"]]@data, rnaseq = TRUE, # RNA-seq parallel.sz = 1, #线程数 ,window建议改为1 parallel.type = "SOCK") xCell[1:4,1:4] rownames(xCell) <- make.names(rownames(xCell)) cell_types <- rownames(xCell) for(cell_type in cell_types){ ST_object <- AddMetaData(object= ST_object,metadata = xCell[cell_type,],col.name = cell_type) } head(ST_object)
Xcell解析出来的结果会更多。 上述使用R包的方法会解析出来指定celltype的占比,也许没有自己重点关注的细胞类型 ,比如TLS[1] ,Tex 等 。 这时可以用自定义的细胞类型基因集对每个spot进行打分代替细胞类型注释。 (1)简单粗暴的方法是可以直接计算基因集的均值 (2)使用AUCell 或者 Seurat的AddModuleScore等方法计算基因集评分 markerList 一般是自定义,这里为了省事直接使用CARD[2]方法中内置的markerList ,选择其中几种celltype作为示例 。 同时完成计算均值 和 AddModuleScore分析。 load("./markerList.RData") markerList markerList2 <- markerList[c("T_cells_&_NK_cells" ,"Fibroblasts" , "Cancer_clone_A","Macrophages_A", "Monocytes" )] names(markerList2) <- c("T_NK" ,"Fibroblasts" , "Cancer","Macrophages", "Monocytes")
for (i in names(markerList2)) { intersect_sign <- intersect(markerList2[[i]],rownames(ST_object)) cell_types <- make.names(paste(i ,"GeneMean",sep = "_")) cell_types_Seurat <- make.names(paste(i ,"Seurat",sep = "_")) #计算均值 ST_object <- AddMetaData(ST_object,apply(as.matrix(ST_object[["SCT"]]@data[intersect_sign,]),2,mean), col.name = cell_types ) # AddModuleScore ST_object <- AddModuleScore(object = ST_object,features = list(intersect_sign),name = cell_types_Seurat) }
head(ST_object)
(1)建议循环中加入intersect 获取共有基因,防止空转数据没有指定markerList 中的基因而报错; (2)可以通过paste 加后缀的方法,区分AddModuleScore 和 均值 的差别; 除上述外,可以使用类似单细胞注释的“金标准”方法(要求较高),使用findmarker计算每个cluster 的top marker 基因,辅助绘制一些marker gene的点图或者小提琴图,根据背景知识对cluster进行注释[3]。 table(ST_object$seurat_clusters) p1 <- DimPlot(ST_object, reduction = "umap", label = TRUE) p2 <- SpatialDimPlot(ST_object, label = TRUE, label.size = 3,pt.size.factor = 3) p1 + p2
1,FindAllMarkers 计算每个cluster的top marker
de_markers <- FindAllMarkers(ST_object)
top.marker <- de_markers %>% group_by(cluster) %>% top_n(n = 30, wt = avg_log2FC)
2 重点marker 可视化 借助一些经典marker的点图 以及 小提琴图辅助注释 ,以下仅为示例。
Marker = list(Epi=c("EPCAM"), Endo=c("PECAM1","PLVAP"), Fibroblast=c("COL1A1","COL1A2"), B=c("CD79A","CD79B","CD19"), T=c("CD3D","CD3E","CD8A","CD4"), Myeloid=c("C1QA","C1QB","CD163","CD1C") )
Marker2 = c("EPCAM", "PECAM1","PLVAP", "COL3A1","COL1A1","COL1A2", "CD79A","CD79B","CD19", "CD3D","CD3E","CD8A","CD4", "C1QA","C1QB","CD163","CD1C" )
#点图 可以接受list DotPlot(ST_object,features=Marker)
VlnPlot(ST_object,features = Marker2,pt.size = 0,ncol = 5)
SpatialFeaturePlot(ST_object, features = c("EPCAM","PECAM1","COL1A1", "CD79A","CD19","CD3D","CD3E","C1QA","CD1C") )
解卷积方式或者基因集集合得到的 每个spot 的细胞类型占比结果,可以类似于基因的表达量进行可视化,这里展示不同方法得到的Fibroblasts的结果。 library(gridExtra) sign_celltype <- c("Fibroblasts","Fibroblasts_Seurat1","Fibroblasts_GeneMean") plot_sign_celltype <- lapply(sign_celltype,function(x){ plot_sign_celltype <- SpatialPlot(ST_object, features = x, image.alpha=0, pt.size.factor = 5) + #scale_fill_viridis_c(option="C") + theme(text=element_text(size=10))+ theme(text=element_text(face = "bold"))+ theme(legend.text=element_text(size=8)) })
plot_sign_celltype[[4]] <- SpatialPlot(ST_object,repel = F,label = F,image.alpha=1,alpha = c(0,0), pt.size.factor =0.00000) + NoLegend() # lay 设置布局 lay <- rbind(c(4,1), c(2,3)) grid.arrange(grobs = plot_sign_celltype, layout_matrix = lay)
可以看到两种基因集方法(seurat 和 均值)差异很小,而与解卷积方式差异较大,注释后一般还会结合 染色结果 对 重点关注的 celltype (CAF ,tumor ,TLS等)继续人工确定。
本文介绍的都是在只有空转数据的情况下进行空间细胞类型的注释,下一篇会分享结合 单细胞数据 进行注释的方法。 参考文献: [1]Tertiary lymphoid structures generate and propagate anti-tu[]mor antibody-producing plasma cells in renal cell cancer [2]Spatially Informed Cell Type Deconvolution for Spatial Transcriptomics [3]Comprehensive analysis of spatial architecture in primary liver cancer 精心整理(含图PLUS版)|R语言生信分析,可视化(R统计,ggplot2绘图,生信图形可视化汇总) RNAseq纯生信挖掘思路分享?不,主要是送你代码!(建议收藏)
|