诹图系列(4): 堆积条形图

生物_医药_科研 2019-04-04

展开全文

作者

厚缊，中观经济咨询助理研究员，业余数据科学爱好者。

绘制图形时，有时需要根据某个因子（factor）进行分面（facet），即根据因子水平拆分原数据，再分别绘制图形。ggplot2包可以通过facet_wrap()或facet_grid()函数简单解决分面问题，但高级的绘图函数往往以损失灵活性为代价，自动分面难以控制每个分面图形的绘图细节。例如，ggplot2通过labeller()控制每个分面标签,当需要扩展时也会显得比较麻烦，更为严重的是，当向每个分面添加不同文本时，需要借助grid包从底层上来寻找解决方案。因此，绘制高度定制化的分面图时形基础绘图系统并不显得劣势。

由于原作者没有提供此图的调查数据，这里利用iris数据集手动生成了虚拟的ggplot2测试数据，说明利用高级绘图系统生成高度定制化的分面图形并不一定更简洁。


library(ggplot2)library(ggthemes)set.seed('20190402')df <- dplyr::sample_n(iris, size = 500, replace = TRUE)df[['group']] <- sample(letters[1:11], 500, replace = TRUE)df[['cat']] <- sample(LETTERS[1:4], 500, replace = TRUE)cols <- c('A' = 'cornsilk4', 'B' = 'cornsilk1',           'C' = 'lightpink', 'D' = 'palevioletred4')ggplot(df, aes(Species, fill = cat) ) +   geom_bar( position = position_fill(),             color = 'black', size = 0.2) +   scale_fill_manual( values = cols) +  facet_grid(group~.) +   coord_flip(ylim = c(0, 1), expand = FALSE) +  labs(title = 'Reading attitude',       subtitle = 'How much do you disagree or agree with these statements about reading?') +  theme( plot.title = element_text(family = 'Arial Black',                               size = 16, lineheight = 10),         plot.subtitle = element_text(size = 12,                                       lineheight = 8,                                       colour = 'grey40'),        axis.title.y = element_blank(),        axis.title.x = element_blank(),        plot.background = element_rect(fill = 'grey95'),        legend.position = 'bottom',        legend.direction = 'horizontal',        legend.title = element_blank(),        legend.background = element_rect(fill = 'grey95'),        legend.spacing.x = grid::unit(0.8, units = 'cm')        )

分面条形图

老规矩，先上效果图。图中是一份针对阅读态度的调查，共11个问题，每个问题是一个分面，其中每个分面中包含美国、墨西哥和加拿大三个国家的调查结果。每个分面的左上角标注调查问题，左侧标注国家名称、持Strongly disagree和Disagree态度的人员所占比例，右侧标注持Agree和Strongly agree态度的人员所占比例。

绘图代码

整幅图的实现思路比较简单，大概可以分为以下五个步骤：

设置par()参数mfcol = c(12, 1)，将绘图区等分为12行、1列，行数比主图11个分面多一个，主要是预留一个绘图空间添加坐标轴和图例。
通过for循环函数分别绘制11个分面主图，其中每次循环提取data_source中的三行数据，然后调用barplot()函数绘制条形图，然后用paste0()函数生成主图的左（llab）右（rlab）标签，并通过text()手动添加标签，注意，添加标签时会溢出绘图区，需要设置xpd = TRUE。最后通过mtext()函数添加每个分面的调查问题名称。
在第十二个预留绘图区内添加坐标轴和坐标轴标签，这里使用了最原始的方法，先用barplot()函数绘制了一个空图，注意X轴范围须设置的与主图相同，然后用segments()、text()函数分别添加坐标轴和坐标轴标签（当然，可以直接调用axis()函数添加坐标轴及标签）。
图例仍然使用points()函数手动添加，主要是方便控制摆放位置。图例摆放位置的坐标根据全图的位置大致判断，出图后位置有偏移或溢出，可以进行适当修正，完全是凭经验判断。
最后是使用mtext()函数添加标题、副标题、数据来源等信息，当这些辅助信息绘制在外边距（outer margin）上时，要设置outer = TRUE。

barchart09 <- 'your/figure/path/barchart09.pdf'cairo_pdf(barchart09, bg = 'grey95', width = 12, height = 19)data_source <- matrix(  c(18, 32, 34, 16, 22, 37, 33, 8, 25, 36, 24, 15,    32, 38, 21, 9, 17, 35, 37, 11, 26, 36, 26, 12,    25, 34, 33, 8, 20, 33, 38, 9, 26, 31, 33, 10,    26, 43, 23, 8, 22, 39, 31, 8, 29, 43, 19, 9,    27, 35, 31, 7, 14, 29, 43, 14, 24, 26, 37, 13,    35, 39, 14, 12, 45, 43, 7, 5, 39, 37, 13, 11,    19, 28, 35, 18, 17, 37, 38, 8, 21, 27, 33, 19,    17, 36, 34, 13, 12, 32, 40, 16, 22, 39, 27, 12,    32, 39, 16, 13, 31, 46, 18, 5, 39, 37, 13, 11,    21, 28, 40, 11, 10, 27, 46, 17, 21, 26, 41, 12,    27, 39, 25, 9, 21, 31, 35, 13, 28, 32, 29, 11),  ncol = 4, byrow = TRUE)question <- c('I read only if I have to.',              'Reading is one of my favorite hobbies.',              'I like talking about books with other people.',              'I find it hard to finish books.',              'I feel happy if I receive a book as a present.',              'For me, reading is a waste of time.',              'I enjoy going to a bookstore or a library.',              'I read only to get information that I need.',              'I cannot sit still and read for more than a few minutes.',              'I like to express my opinions about books I have read.',              'I like to exchange books with my friends.') opar <- par(no.readonly = TRUE)par(  omi = c(1, 0.5, 1.8, 0.5),  mai = c(0.1, 1.45, 0.35, 0.6),  las = 1,  family = 'Arial',  mfcol = c(12, 1))col_name <- c('palevioletred4', 'lightpink', 'cornsilk1', 'cornsilk4')for (i in seq_len(11)) {    idx <- 1:3 + 3 * (i - 1)    data <- t(data_source[idx, ])    llab <- paste0(c('USA', 'Mexico', 'Canada'), ' - ',                    paste0(data[1, ] + data[2, ], '%'))    rlab <- paste0(data[3, ] + data[4, ], '%')    y <- barplot(data, horiz = TRUE, cex.names = 2, xlim = c(0, 100),                 col = col_name, axes = FALSE)    text(rep(-2, 3), y, llab, adj = 1, cex = 1.6,          col = 'grey40', xpd = TRUE)    text(rep(102, 3), y, rlab, adj = 0, cex = 1.6,          col = 'grey40', xpd = TRUE)    mtext(question[i], side = 3, adj = 0.01, line = 0.05,           cex = 1.6, col = 'grey40')}    par(mai = c(0.1, 1.45, 0.1, 0.6))    y <- barplot(data, horiz = TRUE, border = NA, cex.names = 2,                 col = '#00000000', xlim = c(0, 100), axes = FALSE)    segments(0, 3.6, 100, 3.6, col = 'grey40', lty = 1.2, xpd = TRUE)    segments(seq(0, 100, length.out = 5), rep(3.6, 5),             seq(0, 100, length.out = 5), rep(3.45, 5),              col = 'grey40', lty = 1.2, xpd = TRUE)    text(seq(0, 100, length.out = 5), rep(3.2, 5),          labels = seq(0, 100, length.out = 5),         cex = 1.6, col = 'grey40', xpd = TRUE)    points(seq(0, 75, length.out = 4) + 1, rep(1.6, 4), pch = 15,           col = col_name, cex = 4)    legend_lab <- c('Strongly disagree','Disagree','Agree','Strongly agree')    text(seq(3, 79, length.out = 4), rep(1.6, 4), labels = legend_lab,          adj = c(0, 0.5), cex = 1.6, col = 'grey40')

mtext('Reading attitude', side = 3, line = 7, adj = 0, cex = 3,       family = 'Arial Black', outer = TRUE)mtext('How much do you disagree or agree with these statements about reading?',      side = 3, line = 3.5, adj = 0, cex = 1.8,       col = 'grey40', outer = TRUE)mtext('Source: PISA 2009 Assessment Framework – Key Competencies in Reading, Mathematics, and Science',      side = 1, line = 1, adj = 1, cex = 1.1, col = 'grey40',       outer = TRUE)mtext('© OECD 2009, Data: bryer.org', side = 1, line = 3.5, adj = 1,      cex = 1.1, col = 'grey40', outer = TRUE)dev.off()par(opar)