1# First install Bioconductor and Monocle 2if (!requireNamespace("BiocManager")) 3 install.packages("BiocManager") 4 5BiocManager::install() 6BiocManager::install(c("monocle")) 7 8# Next install a few more dependencies 9BiocManager::install(c('DelayedArray', 'DelayedMatrixStats', 'org.Hs.eg.db', 'org.Mm.eg.db')) 1install.packages("devtools") 2devtools::install_github("cole-trapnell-lab/garnett") 3 4library(garnett)
Garnett工作流有两个主要部分,每个部分的详细描述如下:
Train/obtain the classifier: 要么下载现有的分类器,要么训练自己的分类器。为了训练,Garnett解析一个标记文件,选择一组训练细胞,然后训练一个多项分类器来区分细胞类型。
1pbmc_classifier<-hsPBMC 2library(org.Hs.eg.db) 3pbmc_cds <- classify_cells(pbmc_cds, pbmc_classifier, 4 db = org.Hs.eg.db, 5 cluster_extend = TRUE, 6 cds_gene_id_type = "SYMBOL") 7 8head(pData(pbmc_cds)) 9 10DataFrame with 6 rows and 7 columns 11 tsne_1 tsne_2 12 <numeric> <numeric> 13AAGCACTGCACACA-13.840314990935912.0841914129204 14GGCTCACTGGTCTA-19.970962266573473.50539308651821 15AGCACTGATATCTC-13.459529404102814.93527280576176 16ACACGTGATATTCC-11.743949473946417.78267061846286 17ATATGCCTCTGCAA-15.783448295142238.55889827553495 18TGACGAACCTATTC-110.792853048595810.5852739146963 19 Size_Factor FACS_type garnett_cluster 20 <numeric> <character> <logical> 21AAGCACTGCACACA-10.559181445161514 B cells NA 22GGCTCACTGGTCTA-10.515934033527584 B cells NA 23AGCACTGATATCTC-10.698028398302026 B cells NA 24ACACGTGATATTCC-10.815631008885519 B cells NA 25ATATGCCTCTGCAA-11.11532798424345 B cells NA 26TGACGAACCTATTC-10.649469901028841 B cells NA 27 cell_type cluster_ext_type 28 <character> <character> 29AAGCACTGCACACA-1 B cells B cells 30GGCTCACTGGTCTA-1 B cells B cells 31AGCACTGATATCTC-1 B cells B cells 32ACACGTGATATTCC-1 B cells B cells 33ATATGCCTCTGCAA-1 B cells B cells 34TGACGAACCTATTC-1 Unknown Unknown 35 36table(pData(pbmc_cds)$cell_type) 37 B cells CD34+ CD4 T cells CD8 T cells 3832118952 39Dendritic cells T cells Unknown 4012160165 41 42table(pData(pbmc_cds)$cluster_ext_type) 43 44 B cells CD4 T cells Dendritic cells T cells 453732003200 46 Unknown 4724 1qplot(tsne_1, tsne_2, color = cell_type, data = as.data.frame(pData(pbmc_cds))) + theme_bw() 1qplot(tsne_1, tsne_2, color = cluster_ext_type, data = as.data.frame(pData(pbmc_cds)))+ theme_bw()
这里,我们提供了一些常见的标记文件错误和Garnett分类的潜在结果的例子。对于所有面板,分类器在10x PBMC version 2 (V2)数据上进行训练,然后使用分类器对上面所示的10x PBMC version 1 (V1)数据进行分类。第一个面板由基于facs的10x单元类型分配着色。其余的面板由Garnett集群无关的细胞类型分配着色。
A cell type is missing from the marker file。在PBMC标记文件中,不包括T细胞定义(面板2)。在原稿中讨论的例外情况是,缺失的细胞类型(即表达NK标记FCGR3A的NKT细胞)中存在描述现有细胞类型的特征。
A cell type is defined but includes no good specific markers. 在PBMC标记文件中,只使用CD4而不是CD3来定义T细胞(面板3)。在这种情况下,我们发现Garnett只标记了T细胞的一个子集,而未标记其余细胞。
A gene that is not specific and widely expressed is used to define a cell type. 如果我们将MALAT1 (PBMC数据集中表达最多的转录本)添加到T细胞定义(面板4)中,在这种情况下,我们会发现每个细胞类型最终都在真细胞类型和T细胞之间混合分配。在另一种情况下,包含一个广泛表达的非特异性基因可能会导致Garnett根本找不到足够的训练样本,因为它会认为所有细胞都是模糊的(即它们会表达其他标记加上非特异性的)。
A cell type definition includes genes that are specific to another cell type. 是这样一个定义在哪里真正的“错误”,即如果B细胞(CD79A)是最好的标记添加到T细胞的定义(面板5)。我们发现B细胞集群混合细胞类型任务的B细胞和T细胞,但是剩下的细胞类型的标签主要不变。
My species doesn't have an AnnotationDbi-class database If your species doesn't have an available AnnotationDbi-class database, then Garnett won't be able to convert among gene ID types. However, you can still use Garnett for classification. Set db = 'none' and then be sure that you use the same gene ID type in your marker file as your CDS object. When db = 'none' Garnett ignores the arguments for gene ID type.
1citation("garnett") 2 3# Hannah A. Pliner, Jay Shendure & Cole Trapnell (2019). Supervised classification enables rapid annotation of cell atlases. Nature Methods 4# 5# A BibTeX entry for LaTeX users is 6# 7# @Article{, 8# title = {Supervised classification enables rapid annotation of cell atlases}, 9# journal = {Nature Methods}, 10# year = {2019}, 11# author = {Hannah A. Pliner and Jay Shendure and Cole Trapnell}, 12# } 13#