火山图(Volcano Plot)常用于展示基因表达差异的分布,横坐标常为Fold change(倍数),越偏离中心差异倍数越大;纵坐标为P value(P值),值越大差异越显著。得名原因也许是因为结果图像火山吧! 火山图只标示指定基因?这需求都遇到过吧。 一 载入R包,数据library(ggplot2)library(openxlsx) library(dplyr) #绘制火山图数据 data <- read.xlsx("火山图.xlsx", sheet = 1) head(data) #查看数据,主要有P值,Fold change和基因ID即可。 二 绘制火山图(标示最显著的基因)2.1 先根据阈值分出上调和下调基因; data$change <- as.factor(ifelse(data$adj.P.Val < 0.01 & abs(data$logFC) > 1,ifelse(data$logFC > 1,'UP','DOWN'),'NOT')) 2.2 标示差异显著的基因 data$sign <- ifelse(data$adj.P.Val < 0.001 & abs(data$logFC) > 2.5,data$GENE_SYMBOL,NA)head(data)2.3 绘制火山图 ggplot(data = data, aes(x = logFC, y = -log10(adj.P.Val), color = change)) +geom_point(alpha=0.8, size = 1) + theme_bw(base_size = 15) + theme(panel.grid.minor = element_blank(),panel.grid.major = element_blank()) + geom_hline(yintercept=2 ,linetype=4) + geom_vline(xintercept=c(-1,1) ,linetype=4 ) + scale_color_manual(name = "", values = c("red", "green", "black"), limits = c("UP", "DOWN", "NOT")) + geom_text(aes(label = sign), size = 3) 了解一下ggplot2绘图的方式,标示的基因就是各个基因的text,然后想办法将其赋予到一个 aes 中即可。 三 标示指定基因和上面类似,将指定基因添加到绘制数据中即可。 3.1 读入含有geneList的文件gene <- read.xlsx("火山图.xlsx", sheet = 2)gene$geneList <- gene$gene 额外生成一列相同列是为了后面合并后还有一列存在,这一列用于标示基因。(方法有点笨) 3.2 合并火山图数据data2 <- data %>%left_join(gene,by = c("GENE_SYMBOL" = "gene")) head(data2) 增加了geneList列,为了后面使用text的方式添加上基因。3.3 标示文件中的指定基因ggplot(data = data2, aes(x = logFC, y = -log10(adj.P.Val), color = change)) +geom_point(alpha=0.8, size = 1) + theme_bw(base_size = 15) + theme(panel.grid.minor = element_blank(),panel.grid.major = element_blank()) + geom_hline(yintercept=2 ,linetype=4) + geom_vline(xintercept=c(-1,1) ,linetype=4 ) + scale_color_manual(name = "", values = c("red", "green", "black"), limits = c("UP", "DOWN", "NOT")) + geom_text(aes(label = geneList), size = 5,color = "blue") 3.4 ggrepel 解决重叠问题如果目标标示基因太多会导致重叠,可使用ggrepal函数 library(ggrepel) ggplot(data = data2, aes(x = logFC, y = -log10(adj.P.Val), color = change)) +geom_point(alpha=0.8, size = 1) + theme_bw(base_size = 15) + theme(panel.grid.minor = element_blank(),panel.grid.major = element_blank()) + geom_hline(yintercept=2 ,linetype=4) + geom_vline(xintercept=c(-1,1) ,linetype=4 ) + scale_color_manual(name = "", values = c("red", "green", "black"), limits = c("UP", "DOWN", "NOT")) + geom_label_repel(aes(label=geneList), fontface="bold", color="grey50", box.padding=unit(0.35, "lines"), point.padding=unit(0.5, "lines"), segment.colour = "grey50") 呐,可以随意标示感兴趣的基因了。 |
|