“ No one knows everything, and you don't have to.” --free傻孩子本期想为大家推荐的是一个自编函数(group_data);该函数的目的是处理生活和工作中常见的分组问题。在遇到以下情景时可能需要使用该函数,如根据员工的不同业务水平的综合得分对员工进行业绩考核;或者根据学生不同科目的得分情况对学生进行排名;或者根据不同微生物的丰度将微生物分为稀有(rare)种、丰富(abundant)种等。 本函数适应的数据格式为“数据框”;数据需要包含行名和列名。具体格式如下:
group_data <- function(data, by, group) { library(tidyverse)#load needed package data[,1:length(data)] %>% scale() %>% apply(1,mean) %>% #calculate mean for column data.frame(data) ->df2 #calculate scores group_by = quantile(df2[[1]],probs = by)#calculate group colnames(df2)[1] = "scores"# add a colnames for scores df2$group[df2[1] >= group_by[1]] = group[1]#create group df2$group[df2[1] < group_by[length(by)]] = group[length(by)+1]#create group i = 2 while(i <= length(by)) {#create group for data in range 2 to length(by) df2$group[df2[1] < group_by[i-1] & df2[1] >= group_by[i]] = group[i] i = i +1 } print(df2) } 注意:本函数需要用到“tidyverse”包,如果没有安装的话建议安装后再运行以上函数。
目的:根据所有业务的综合得分考核公司员工 要求:将公司员工分为三类得分在前90%的为优秀员工; 要求:得分在90%-70%的为良好员工; 要求:得分在70%以下的为不良员工。 数据如下:
df <- data.frame( 业务1 = c(12,15,18,20,10,14), 业务2 = c(20,25,21,28,29,21), 业务3 = c(40,60,70,90,100,20), 业务4 = c(100,200,300,90,400,230) ) rownames(df) <- c("小吴","小刘","王吴", "小天","小明","赵四") group = c("优秀","良好","不良") by = c(0.9,0.7)#top0.9,0.7 group_data(data = df,by = by, group = group) 计算后得到的结果如下:
#目的:根据所有科目的综合得分对班级学生排名 #要求:将班级学生分为四类得分在前90%的为A; #得分在90%-80%的为B; #得分在80%-70%的为C; #得分在70%以下的为D。 数据如下: df2 <- data.frame( 物理 = sample(0:50,9,replace = FALSE), 生物 = sample(0:50,9,replace = FALSE), 政治 = sample(0:100,9,replace = FALSE), 英语 = sample(0:100,9,replace = FALSE), 语文 = sample(0:150,9,replace = FALSE), 数学 = sample(0:150,9,replace = FALSE) ) rownames(df2) <- c("小吴","小刘","王吴", "小天","小明","赵四", "王五","刘三","李一")
by2 = c(0.9,0.8,0.7)#top0.9,0.8,0.7 group2 = c("A","B","C","D") group_data(data = df2,by = by2, group = group2) 结果如下:
|