【原】Seurat V5|一个函数就能解决多种去批次方法，按需尝试

生信补给站 2023-12-19 发布于北京

展开全文

Seurat 是单细胞RNA数据分析的一个非常主流的R包，升级到当前V5版本后，会带来一些不友好的地方，但是也有一些功能上的升级，大家一定根据自己的情况和分析需求来确定是否升级。

V5的升级部分主要体现在以下4个方面（https:///seurat/articles/get_started_v5_new），本次先介绍第一个：Seurat V5中去批次方法的集成。

Seurat v5引入了更加灵活和精简的基础架构，可以用一行代码完成不同的集成去批次算法，极大的减少了不同方法的环境准备和数据处理时间，可以更聚焦在使用哪种方法效果更好。这使得探索不同集成方法的结果变得更加容易，并将这些结果与排除集成步骤的工作流进行比较。

本文以ifnb数据集作为示例，展示去批次的过程和方法。

一 R包，数据准备

1 载入R包

下载相关的R包，注意现在直接install.packages('Seurat')默认安装的就是V5版本。

library(Seurat)library(SeuratData)#remotes::install_github("satijalab/seurat-wrappers")remotes::install_local("./seurat-wrappers-master.zip",upgrade = F,dependencies = F)library(SeuratWrappers)library(ggplot2)library(patchwork)options(future.globals.maxSize = 1e9)

该系列会有较多的R包是在github中，可能存在无法安装的问题。

以satijalab/seurat-wrappers为例，当github的包无法下载时候，可以找到github地址然后点击Code，下载zip文件，

然后使用remotes::install_local的方式本地安装。

2 下载示例数据

测试数据集同样在外网，受限于上网方式和网速，也大概率会报错。

无法下载的可以尝试下载到本地然后再安装（http://seurat./src/contrib/ifnb.SeuratData_3.1.0.tar.gz），更多数据集的名称以及下载链接参考https://zhuanlan.zhihu.com/p/661800023https://zhuanlan.zhihu.com/p/661800023 。

# 下载测试数据集#InstallData("ifnb")install.packages('./ifnb.SeuratData_3.1.0.tar.gz', repos = NULL, type = "source")

下载后载入数据，然后查看待处理的批次情况（stim列）

# load in the pbmc systematic comparative analysis datasetobj <- LoadData("ifnb")obj <- UpdateSeuratObject(obj)obj <- subset(obj, nFeature_RNA > 1000)obj
An object of class Seurat 14053 features across 1254 samples within 1 assay Active assay: RNA (14053 features, 0 variable features) 2 layers present: counts, data

可以看到Seurat V5一个很大的变化就是layer。

二数据整合（批次处理）

1，数据拆分

示例的Seurat对象中包含2种不同处理的数据(meta的stim列)，使用Seurat v5 整合时是拆分为不同的layer 而无需拆分为多个对象。可以看到拆分后出现4个layer (stim列中的每个批次都有自己的count和data矩阵)。

Seurat V4 需要将数据拆分为2个不同的Seurat对象。

obj[["RNA"]] <- split(obj[["RNA"]], f = obj$stim)obj
An object of class Seurat 14053 features across 1254 samples within 1 assay Active assay: RNA (14053 features, 0 variable features) 4 layers present: counts.CTRL, counts.STIM, data.CTRL, data.STIM

请注意，由于数据被分成几层，因此对每一批次独立执行归一化和HVG 。(自动识别一组一致的变量特征)。

obj <- NormalizeData(obj)obj <- FindVariableFeatures(obj)obj <- ScaleData(obj)obj <- RunPCA(obj)

这里会针对每个“batch”分别进行NormalizeData 和 FindVariableFeatures。

2 数据直接合并（不去批次）

先尝试直接合并的方式，查看数据的批次情况

#直接整合obj <- FindNeighbors(obj, dims = 1:30, reduction = "pca")obj <- FindClusters(obj, resolution = 2, cluster.name = "unintegrated_clusters")
obj <- RunUMAP(obj, dims = 1:30,                reduction = "pca",                reduction.name = "umap.unintegrated")
DimPlot(obj, reduction = "umap.unintegrated",         group.by = c("stim", "seurat_annotations"))

3，一行代码去批次

Seurat v5中的integratelayer函数支持一行代码完成去批次集成分析，当前支持以下五种主流的单细胞集成去批次方法。

Anchor-based CCA integration (method=CCAIntegration)
Anchor-based RPCA integration (method=RPCAIntegration)
Harmony (method=HarmonyIntegration)
FastMNN (method= FastMNNIntegration)
scVI (method=scVIIntegration)

#CCAobj <- IntegrateLayers(  object = obj, method = CCAIntegration,  orig.reduction = "pca", new.reduction = "integrated.cca",  verbose = FALSE)#RPCAobj <- IntegrateLayers(  object = obj, method = RPCAIntegration,  orig.reduction = "pca", new.reduction = "integrated.rpca",  verbose = FALSE)#Harmonyobj <- IntegrateLayers(  object = obj, method = HarmonyIntegration,  orig.reduction = "pca", new.reduction = "harmony",  verbose = FALSE)#FastMNNobj <- IntegrateLayers(  object = obj, method = FastMNNIntegration,  new.reduction = "integrated.mnn",  verbose = FALSE)obj

可见新增加了4种去批次方法，下面就是依次可视化，然后选择最终的方法继续后续分析。

还要注意定义new.reduction的名字，不然会被覆盖掉。

4，确定去批次方法

4.1 ，umap展示

这里用CCA 和 RPCA 示例，其他的两种同样的方式，注意修改reduction.name 。

#####CCA######obj <- FindNeighbors(obj, reduction = "integrated.cca", dims = 1:30)obj <- FindClusters(obj, resolution = 2, cluster.name = "cca_clusters")
obj <- RunUMAP(obj, reduction = "integrated.cca",                dims = 1:30,                reduction.name = "umap.cca")p1 <- DimPlot(  obj,  reduction = "umap.cca",  group.by = c("Method", "CellType", "cca_clusters"),  combine = FALSE, label.size = 2)#####RPCA######obj <- FindNeighbors(obj, reduction = "integrated.rpca", dims = 1:30)obj <- FindClusters(obj, resolution = 2, cluster.name = "rpca_clusters")
obj <- RunUMAP(obj, reduction = "integrated.rpca",                dims = 1:30,                reduction.name = "umap.rpca")p2 <- DimPlot(  obj,  reduction = "umap.rpca",  group.by = c("Method", "CellType", "rpca_clusters"),  combine = FALSE, label.size = 2)
wrap_plots(c(p1, p2), ncol = 2, byrow = F)

对比直接合并，可以看到不同stim之间的批次效应被整合，可以加上另两种同时展示4种方法，现在一种进行后续的分析。

4.2 Marker 可视化

还可以利用经典marker比较不同去批次方法的表现

（1）VlnPlot 图

p1 <- VlnPlot(  obj,  features = "rna_CD8A", group.by = "unintegrated_clusters") +   NoLegend() + ggtitle("CD8A - Unintegrated Clusters")p2 <- VlnPlot(  obj, "rna_CD8A",  group.by = "cca_clusters") +   NoLegend() + ggtitle("CD8A - CCA Clusters")p3 <- VlnPlot(  obj, "rna_CD8A",  group.by = "rpca_clusters") +   NoLegend() + ggtitle("CD8A - RPCA Clusters")
p1 | p2 | p3

（2）DimPlot 图

p4 <- DimPlot(obj, reduction = "umap.unintegrated", group.by = c("cca_clusters"))p5 <- DimPlot(obj, reduction = "umap.rpca", group.by = c("cca_clusters"))p6 <- DimPlot(obj, reduction = "umap.cca", group.by = c("cca_clusters"))p4 | p5 | p6

根据以上的信息确定最终使用的去批次方法。

三 FindMarker 分析

确定去批次方法后，就可以进行FindMarker 以及注释。

1，rejoin layer

要注意当前的layer是根据stim批次拆分开的，在进行任何的differential expression analysis之前都要先使用JoinLayers函数进行rejoin the layers 。

objobj2 <- JoinLayers(obj) #仅为了区分，实际情况下使用obj即可obj2

接下来就是DEG分析，找到各个cluster的marekr基因进行手动注释或者直接使用singleR等自动注释软件完成注释。

参考资料：

https:///seurat/articles/seurat5_integration

https:///seurat/articles/integration_introduction

◆ ◆ ◆ ◆ ◆

精心整理（含图PLUS版）|R语言生信分析，可视化（R统计，ggplot2绘图，生信图形可视化汇总）

RNAseq纯生信挖掘思路分享？不，主要是送你代码！（建议收藏）

转藏分享

QQ空间 QQ好友新浪微博微信

献花（0） +1

来自：生信补给站 > 《待分类》

举报/认领

0条评论

发表

请遵守用户评论公约

类似文章

生信补给站

关注对话

TA的最新馆藏

scRNA|R版CytoTRACE v2从0开始完成单细胞分化潜能预测
scRNA|使用scMetabolism完成单细胞代谢激活分数估计
RNAseq | ComplexHeatmap绘制临床数据热图（所见即所得）
scTCR+scRNA | APackOfTheClones - umap坐标下球形展示celltype的clone size
RNAseq-ML | SuperPC 算法构建预后模型并预测
Seurat_V5|单细胞转录组 + 蛋白，WNN方法分析单细胞多模态数据

喜欢该文的人也喜欢更多

热门阅读换一换