介绍看到Y叔为ggord做的添加置信椭圆的geom_ord_ellipse.R(用法见上一篇文章《画个小圈圈》),决定学习一点ggplot图形的语言,对于初学者最好的方法就是照葫芦画瓢,而Y叔的代码自然是最好的模板。我对Y叔的代码进行了逐行的分析,希望以后有需要可以套用。 以下为geom_ord_ellipse.R 代码。这个图层的代码其实很短,很简洁,但是如果想要透彻理解还是需要下些功夫的。 ##' add confidence ellipse to ordinary plot produced by ggord##'##' ##' @title geom_ord_ellipse ##' @param mapping aes mapping ##' @param ellipse_pro confidence value for the ellipse##' @param fill color to fill the ellipse, NA by default##' @param ... additional parameters##' @return ggplot layer##' @importFrom ggplot2 aes_##' @importFrom ggplot2 layer##' @importFrom utils modifyList##' @export##' @author Guangchuang Yu##' @references \url{http://lchblogs./post/2017-12-22-r-addconfellipselda/}geom_ord_ellipse <- function(mapping="NULL," ellipse_pro="0.97," fill="NA," ...)="" {="" ="" default_aes=""><- aes_(color="~Groups," group="~Groups)" ="" if="" (is.null(mapping))="" {="" ="" ="" ="" mapping=""><- default_aes="" ="" }="" else="" {="" ="" ="" ="" mapping=""><- modifylist(default_aes,="" mapping)="" ="" }="" ="" layer(="" ="" ="" ="" geom='polygon' ,="" ="" ="" ="" stat="StatOrdEllipse," ="" ="" ="" mapping="mapping," ="" ="" ="" position='identity' ,="" ="" ="" ="" data="NULL," ="" ="" ="" params="list(" ="" ="" ="" ="" ="" ellipse_pro="ellipse_pro," ="" ="" ="" ="" ="" fill="fill," ="" ="" ="" ="" ="" ...="" ="" ="" ="" )="" ="" )}##'="" @importfrom="" ggplot2="" ggproto##'="" @importfrom="" ggplot2="" stat##'="" @importfrom="" plyr="" ddply##'="" @importfrom="" grdevices="" chullstatordellipse=""><- ggproto('statordellipse',="" stat,="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" compute_group="function(self," data,="" scales,="" params,="" ellipse_pro)="" {="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" names(data)[1:2]=""><- c('one',="" 'two')="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" theta=""><- c(seq(-pi,="" pi,="" length="50)," seq(pi,="" -pi,="" length="50))" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" circle=""><- cbind(cos(theta),="" sin(theta))="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ell=""><- ddply(data,="" .(group),="" function(x)="" {="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" if(nrow(x)=""><= 2)="" {="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" return(null)="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" }="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" sigma=""><- var(cbind(x$one,="" x$two))="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" mu=""><- c(mean(x$one),="" mean(x$two))="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ed=""><- sqrt(qchisq(ellipse_pro,="" df="2))" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" data.frame(sweep(circle="" %*%="" chol(sigma)="" *="" ed,="" 2,="" mu,="" fun='+' ))="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" })="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" names(ell)[2:3]=""><- c('one',="" 'two')="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ell=""><- ddply(ell,="" .(group),="" function(x)="" x[chull(x$one,="" x$two),="" ])="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" names(ell)=""><- c('groups',="" 'x',="" 'y')="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" return(ell)="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" },="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" required_aes="c('x'," 'y',="" 'group')="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" )##="" .="" function="" was="" from="" plyr="" package.=""><- function="" (...,="" .env="parent.frame())" {="" ="" structure(as.list(match.call()[-1]),="" env=".env," class='quoted'>
roxygen 文档注释
##' add confidence ellipse to ordinary plot produced by ggord##'##' ##' @title geom_ord_ellipse ##' @param mapping aes mapping ...##' @return ggplot layer##' @importFrom ggplot2 aes_...##' @export##' @author Guangchuang Yu##' @references \url{http://lchblogs./post/2017-12-22-r-addconfellipselda/}
roxygen会根据这一部分对单一函数生成帮助文档,也就是我们在R命令行中输入?FunctionName 看到的帮助信息。 
其基础格式是(Y叔使用了##' ,我觉得好像跟#' 没有什么区别?): #' @param 函数参数(对应Arguments) 函数的介绍(对应Description)
上面代码的注释大多可以顾名思义。比较有意思的是#' @export 这行了,roxygen会把这个函数放在NAMESPACE 文件中,这样用户便可以调用这个函数。我是可以调用yyplot::geom_ord_ellipse() 这个函数的。但yyplot:StatOrdEllipse() 这个函数是yyplot的内部函数(注意代码中此函数没有#' @export 注释)。如果我调用就会报错: yyplot::StatOrdEllipse()
更多关于roxygen的介绍可以参考这篇文章或官方文档。 ggproto —ggplot2 的语言
ggproto是ggplot2模块化、面向对象(Object Oriented)化的核心部分。基础的格式是: ggproto(`_class` = NULL, `_inherit` = NULL, ...)
ggproto 是一个很庞大的系统,我目前理解还不是很深入,提供一些参考资料:
在理解Y叔这个脚本中我们需要使用最基础的两个模块Geom (创建图层),Stat (数据处理)。 StatOrdEllipse 内部函数—ggplot2 中的数据处理
我们在作图之前基本都是要对输入数据进行一些数据预处理,比如在做线箱图的时候需要计算中位数、IQR等。在这个脚本中,我们需要做的是计算置信区间椭圆,这一步是通过ggplot::Stat 实现的。 StatOrdEllipse <- ggproto('statordellipse',="" stat,="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" compute_group="function(self," data,="" scales,="" params,="" ellipse_pro)="" {="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ##="" 此处省略...="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ##="" 解析见后文="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" return(ell)="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" },="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" required_aes="c('x'," 'y',="" 'group')="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="">
_class :这个类的名字为StatOrdEllipse
_inherit :继承Stat 类
compute_group :核心处理数据部分,对每一组进行处理,模板为compute_group(self, data, scales, ...) ,在这里,它主要接受置信区间(ellipse_pro )参数,返回值为计算好的置信区间轮廓上的点坐标。另外,我认为这里params 并不必要。
required_aes :创建图层所需要的mapping参数
geom_ord_ellipse 函数—创建ggplot2 图层
下面便是重头戏,使用上面的Stat 来创建一个Geom 图层。其实这就是一个普通的函数,只是为了返回一个图层layer 。 geom_ord_ellipse <- function(mapping="NULL," ellipse_pro="0.97," fill="NA," ...)="" {="" ="" default_aes=""><- aes_(color="~Groups," group="~Groups)" ="" if="" (is.null(mapping))="" {="" ="" ="" ="" mapping=""><- default_aes="" ="" }="" else="" {="" ="" ="" ="" mapping=""><- modifylist(default_aes,="" mapping)="" ="" }="" ="" layer(="" ="" ="" ="" geom='polygon' ,="" ="" ="" ="" stat="StatOrdEllipse," ="" ="" ="" mapping="mapping," ="" ="" ="" position='identity' ,="" ="" ="" ="" data="NULL," ="" ="" ="" params="list(" ="" ="" ="" ="" ="" ellipse_pro="ellipse_pro," ="" ="" ="" ="" ="" fill="fill," ="" ="" ="" ="" ="" ...="" ="" ="" ="" )="" ="">
函数的输入值:输入预处理:这里对输入的mapping 做了判断,如果没有输入,就会使用默认值,如果有输入,则替换默认值。 default_aes <- aes_(color="~Groups," group=""> :定义了默认的aesthetics,使用aes_ 时,变量要用双引号引用或使用~ (~Groups )
modifyList :根据一个list 修改另一个list 中的值
返回的ggplot::layer :这个函数的模板是: layer(geom = NULL, stat = NULL, data = NULL, mapping = NULL, position = NULL, params = list(), inherit.aes = TRUE, check.aes = TRUE, check.param = TRUE, subset = NULL, show.legend = NA)
对应到Y叔的函数: geom = 'polygon' :一个多边形(椭圆)
stat = StatOrdEllipse :使用我们定义的Stat
mapping = mapping :数据与图形的对应,如x=, y=
position = 'identity' :位置的定义
data = NULL :从上一图层继承
params = list(...) :geom 和stat 的参数
到这里,对于一些基本的图层,我觉得完全可以套用Y叔的这个模板。在宏基因组公众号中曾经有人问,这个ggord 包中能不能把那些向量去掉,或者加入少部分变量的向量,我想用这个模板完全可以实现,只需要用GeomCurve 来做个图层就可以了。 计算置信区间椭圆names(data)[1:2] <- c('one',="" 'two')theta=""><- c(seq(-pi,="" pi,="" length="50)," seq(pi,="" -pi,="" length="50))circle"><- cbind(cos(theta),="" sin(theta))ell=""><- ddply(data,="" .(group),="" function(x)="" {="" ="" if(nrow(x)=""><= 2)="" {="" ="" ="" ="" return(null)="" ="" }="" ="" sigma=""><- var(cbind(x$one,="" x$two))="" ="" mu=""><- c(mean(x$one),="" mean(x$two))="" ="" ed=""><- sqrt(qchisq(ellipse_pro,="" df="2))" ="" data.frame(sweep(circle="" %*%="" chol(sigma)="" *="" ed,="" 2,="" mu,="" fun='+' ))="" ="" })names(ell)[2:3]=""><- c('one',="" 'two')ell=""><- ddply(ell,="" .(group),="" function(x)="" x[chull(x$one,="" x$two),="" ])="" names(ell)=""><- c('groups',="" 'x',="">
这个代码是从ggord 的源码改过来的。对于排序图来说,基本最后都会降维到一个低维的空间(2维),方便展示。所以,我们最后需要处理的问题就是根据2维上的散点,计算出这些点分布的可能范围(在2维正态分布的假设下,对协方差使用卡方检验)。其中涉及到我们要把数据分成组(不同椭圆,不同颜色标记),然后对每一组求出上述的范围。这个操作使用ddply实现的: 分而治之ell <- ddply(data,="" .(group),="" function(x)="" {="" ="" ##="" ...="" ="" ##="" 解析见后文="" ="">
在此,把data 根据group 拆分成组,然后每一组套用function ,最后再把结果组合(rbind )在一起。以下我们介绍每一组是如何处理的。 计算置信椭圆从统计上来讲这个置信椭圆是这样做的: 做出一个单位圆(半径为单位1) theta <- c(seq(-pi,="" pi,="" length="50)," seq(pi,="" -pi,="" length="50))##" should="" be:##="" theta=""><- seq(-pi,="" pi,="" length="50)circle"><- cbind(cos(theta),="">
我们知道单位圆的参数方程为,在这里ggord的作者其实重复了两圈,具体的原因我也没有搞清楚,我认为没有必要。 套用公式把这个圆转换成椭圆
sigma <- var(cbind(x$one,="" x$two))mu=""><- c(mean(x$one),="" mean(x$two))ed=""><- sqrt(qchisq(ellipse_pro,="" df="2))data.frame(sweep(circle" %*%="" chol(sigma)="" *="" ed,="" 2,="" mu,="" fun='+'>
这里的转化实际是:
 其中 ,也就是数据的中心,k根据概率控制椭圆的大小(因为是针对2维正态分布的协方差,所以使用了卡方检验), 代表协方差矩阵的Cholesky分解,X和 分别为圆和椭圆上对应点的坐标。 至于为什么这么做,涉及一些线代推导,可以参见下面两篇文章(特别是第一篇): https://jellymatter./2011/03/31/drawing-confidence-ellipses-and-ellipsoids/#more-507 http://www./2014/04/draw-error-ellipse-representing-covariance-matrix/
有点多余?names(ell)[2:3] <- c('one',="" 'two')ell=""><- ddply(ell,="" .(group),="" function(x)="" x[chull(x$one,="" x$two),="">
这部分的意思是对之前算出的那些椭圆上的点找出对应的凸多边形。我认为对于上述方法得到的点已经在一个凸多边形上了(椭圆)。我没有太明白这一步的必要性,去掉之后并没有发现影响。
作者李陈浩,没错就是那个写了个原型,请我来封装成图层的人,我在他代码(其实来源于ggord包)的基础上,封装了图层,并且修改增强为适应其它ordination plot的通用图层。他对我的代码进行了逐行解析,是学习写ggplot2的好材料,当然这里其实只是写了Stat而已,也就是我之前说的数据处理,画图的Geom使用的是内置的polygon,写图层的能力还应该包括写Geom,以后有机会再跟大家介绍。
|