出错的原因在于,我根本就没给GSEA分析的结果写barplot
方法,所以默认去调用graphics::barplot.default
了,于是出错。
但这样的图,对于我们来说,简直简单的不要不要的。
首先跑一下GSEA的分析
这里用ReactomePA
来跑一下通路的GSEA分析:
library(ReactomePA)
data(geneList)
x <- gsePathway(geneList)
结果中有308个通路是显著性的。
> x
#
# Gene Set Enrichment Analysis
#
#...@organism human
#...@setType Reactome
#...@keytype ENTREZID
#...@geneList Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
- attr(*, 'names')= chr [1:12495] '4312' '8318' '10874' '55143' ...
#...nPerm 10000
#...pvalues adjusted by 'BH' with cutoff <0.05
#...308 enriched terms found
'data.frame': 308 obs. of 11 variables:
$ ID : chr 'R-HSA-1474244' 'R-HSA-216083' 'R-HSA-3000178' 'R-HSA-3000171' ...
$ Description : chr 'Extracellular matrix organization' 'Integrin cell surface interactions' 'ECM proteoglycans' 'Non-integrin membrane-ECM interactions' ...
$ setSize : int 266 80 74 56 53 41 38 36 35 33 ...
$ enrichmentScore: num -0.458 -0.512 -0.626 -0.586 -0.592 ...
$ NES : num -1.94 -1.86 -2.24 -1.99 -1.99 ...
$ pvalue : num 0.000133 0.000152 0.000154 0.000158 0.000159 ...
$ p.adjust : num 0.00295 0.00295 0.00295 0.00295 0.00295 ...
$ qvalues : num 0.00215 0.00215 0.00215 0.00215 0.00215 ...
$ rank : num 1943 1890 1890 2538 1890 ...
$ leading_edge : chr 'tags=33%, list=16%, signal=29%' 'tags=39%, list=15%, signal=33%' 'tags=46%, list=15%, signal=39%' 'tags=45%, list=20%, signal=36%' ...
$ core_enrichment: chr '8038/11132/4017/1288/4811/3910/3371/1291/3791/831/1301/4238/7450/3685/80781/1280/1306/4314/3675/8425/977/4054/7'| __truncated__ '3371/1291/3791/7450/3685/80781/1280/3675/1278/4060/1277/1293/1295/58494/1281/83700/50509/1290/3693/1294/3339/12'| __truncated__ '3910/3371/1291/3685/1280/7042/3912/1278/4060/1277/1293/1281/50509/1290/3693/3339/1462/1289/1292/3908/3909/6678/'| __truncated__ '3915/6385/4921/1288/3910/3371/1301/3685/1280/3912/1278/1277/2247/1281/50509/1290/3693/3339/1289/3908/3909/3913/1300/1287' ...
#...Citation
Guangchuang Yu, Qing-Yu He. ReactomePA: an R/Bioconductor package for
reactome pathway analysis and visualization. Molecular BioSystems
2016, 12(2):477-479
clusterProfiler.dplyr神器
用这个包,你可以对输出像data frame一样使用dplyr
的函数去操作。
于是这里使用arrange
去给NES的绝对值从大到小排序,NES是我们用于计算p值的统计量,正数越大则通路越有可能是激活的,而负数越小则通路越有可能是被抑制的,所以我们也经常看到使用这一统计量进行画图。
然后我们使用group_by
把数据分组了,按照NES正负数来分,最后我们把每一组切出来前5个。也就是说分别取激活或抑制最大的5个通路。
library(clusterProfiler.dplyr)
y <- arrange(x, abs(NES)) %>%
group_by(sign(NES)) %>%
slice(1:5)
最后如我们所愿,取出10个通路。
> y
#
# Gene Set Enrichment Analysis
#
#...@organism human
#...@setType Reactome
#...@keytype ENTREZID
#...@geneList Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
- attr(*, 'names')= chr [1:12495] '4312' '8318' '10874' '55143' ...
#...nPerm 10000
#...pvalues adjusted by 'BH' with cutoff <0.05
#...10 enriched terms found
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 10 obs. of 12 variables:
$ ID : chr 'R-HSA-397014' 'R-HSA-8978868' 'R-HSA-9006934' 'R-HSA-211859' ...
$ Description : chr 'Muscle contraction' 'Fatty acid metabolism' 'Signaling by Receptor Tyrosine Kinases' 'Biological oxidations' ...
$ setSize : int 183 141 418 165 117 308 282 169 162 155
$ enrichmentScore: num -0.357 -0.37 -0.338 -0.373 -0.402 ...
$ NES : num -1.45 -1.46 -1.49 -1.5 -1.54 ...
$ pvalue : num 0.004193 0.006929 0.000503 0.001696 0.003381 ...
$ p.adjust : num 0.0216 0.03289 0.00381 0.01039 0.01827 ...
$ qvalues : num 0.01572 0.02394 0.00277 0.00756 0.0133 ...
$ rank : num 2990 1951 2788 2129 2829 ...
$ leading_edge : chr 'tags=34%, list=24%, signal=26%' 'tags=26%, list=16%, signal=22%' 'tags=27%, list=22%, signal=21%' 'tags=25%, list=17%, signal=21%' ...
$ core_enrichment: chr '59/9992/800/6543/4638/3750/2982/11280/3709/2977/6546/3672/7169/6335/783/10174/4604/6588/7139/27091/3784/1760/72'| __truncated__ '5563/284541/37/224/3248/1543/79966/2181/8644/5095/27306/80724/11001/33/6785/7923/6584/57834/1384/2166/60481/831'| __truncated__ '26052/534/2065/3709/7423/1215/2263/83464/26469/7057/4670/7072/1298/3915/5921/5441/200734/9846/9611/3315/466/111'| __truncated__ '4837/6799/1553/8459/284541/7366/8648/1543/2098/29104/1548/54996/27306/196/2232/2954/57834/6817/22977/1581/7358/'| __truncated__ ...
$ sign(NES) : num -1 -1 -1 -1 -1 1 1 1 1 1
- attr(*, 'groups')=Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 2 variables:
..$ sign(NES): num -1 1
..$ .rows :List of 2
.. ..$ : int 1 2 3 4 5
.. ..$ : int 6 7 8 9 10
..- attr(*, '.drop')= logi FALSE
#...Citation
Guangchuang Yu, Qing-Yu He. ReactomePA: an R/Bioconductor package for
reactome pathway analysis and visualization. Molecular BioSystems
2016, 12(2):477-479
画图
然后你直接就可以对上面这样的对象使用ggplot
进行画图,你没看错,你不需要转为data frame,而是直接对这个S4对象进行ggplot
,魔法就在enrichplot
包。然后一切都是ggplot
语法了。
library(forcats)
library(ggplot2)
library(ggstance)
library(enrichplot)
ggplot(y, aes(NES, fct_reorder(Description, NES), fill=qvalues), showCategory=10) +
geom_barh(stat='identity') +
scale_fill_continuous(low='red', high='blue') +
theme_minimal() + ylab(NULL)