To run the R commands in this post, you should first work through Analysing microarry data in BioConductor. A volcano plot is a scatter plot that is often used when analysing micro-array data sets to give an overview of interesting genes. The log fold change is plotted on the x-axis and the negative log10 p-value is plotted on the y-axis. Using the GSE20986 data set, the x-axis shows the log fold change between the HUVEC control cell line and one of the three treatment endothelial cell lines. After running the preliminarily analysis, we construct a table containing the fold change and p-values for all changes:
The relevant columns can be obtained via the “$” operator, viz:
The volcano plot can then be generated using the standard plot command:
which gives:
A more principled way of choosing a p-value cut-off is to use a multiple testing rule. Common rules are Bonferroni and FDR. When we carry out a large number of statistical tests, the Bonferroni cut-off is approximately 0.05/#tests. So
If we wanted to use the false discovery rate as a cut-off (FDR), then we would use the adjusted p-value:
In many microarray studies, the FDR cut-off is used. However, when we have a large number of statistically significant genes, as in this example, a more conservative rule can be useful. Using ggplot2 for volcano plotsThe problem with the above volcano plot is that since we are plotting around 27,000 points – one point per gene – then the points overlap and the colour becomes dense. An alternative graphical framework to the base graphics is the ggplot2 package. This package can be installed directly from CRAN, in the usual manner:
Then to construct the volcano plot, we use the following commands:
By using the “alpha” and “size” options in ggplot2, we can control the transparency and size of the points. To add text to the plot, we use geom_text. For example,
You can also pass vectors of text labels. |
|
来自: 树袋熊zkqgpfpq > 《文件夹1》