![]() R数据科学,大佬博客:https://craig./![]() ![]() ![]() ####----按照R in Action的例子展开介绍_dplyr----####rm(list = ls())manager <- c(1, 2, 3, 4, 5)date <- c('10/24/08', '10/28/08', '10/1/08', '10/12/08', '5/1/09')country <- c('US', 'US', 'UK', 'UK', 'UK')gender <- c('M', 'F', 'F', 'M', 'F')age <- c(32, 45, 25, 39, 99)q1 <- c(5, 3, 3, 3, 2)q2 <- c(4, 5, 5, 3, 2)q3 <- c(5, 2, 5, 4, 1)q4 <- c(5, 5, 5, NA, 2)q5 <- c(5, 5, 2, NA, 1)leadership <- data.frame(manager, date, country, gender, age, q1, q2, q3, q4, q5, stringsAsFactors=FALSE) 接下来,我们通过dplyr包的函数对leadership进行操作。
high_potentials <- filter(leadership, total_score > 10) 通过filter函数筛选总分 >10的候选者。 ![]()
通过select函数,选择manager、country和mean_score。 ![]() high_potentials <- arrange(high_potentials, country, mean_score) ![]()
我们仍然以数据集GSE66957为例介绍。GSE表达矩阵的加载,数据Log2转换的判断和数据标准化处理同前面(GEO数据实战,合并merge讨论)。差别在探针ID转换的步骤上。
我们可以通过select函数和rename函数对GSE66957数据集的信息进行选取和重命名,但实际上没有必要。在本次实操中,我们主要利用mutate函数和select函数获取清洁数据。 ![]() ![]() ![]() 接下来我们利用mutate函数把表达矩阵和探针矩阵合并。
然后,通过select函数,我们获取GeneSymbol在第一列,ID删除的矩阵。
然后探针名称重复的取平均值,表达为数据框。 ![]() exprset4 <- select(exprset3,seq(1:12),GSM1634937,GSM1634938,GSM1634939,GSM1634940, GSM1634941,GSM1634942,GSM1634943,GSM1634944,GSM1634945,GSM1634946, GSM1634950,GSM1634955)exprset4 =exprset4[-1,]
install.packages('tidyr') library(tidyr) library(dplyr, warn.conflicts = FALSE) ## 示例数据 vt_census <- tidycensus::get_decennial( geography = 'block', state = 'VT', county = 'Washington', variables = 'P1_001N', year = 2020 ) #> Getting data from the 2020 decennial Census #> Using the PL 94-171 Redistricting Data summary file #> Note: 2020 decennial Census data use differential privacy, a technique that #> introduces errors into data to preserve respondent confidentiality. #> ℹ Small counts should be interpreted with caution. #> ℹ See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance. #> This message is displayed once per session. vt_census #> # A tibble: 2,150 × 4 #> GEOID NAME variable value #> <chr> <chr> <chr> <dbl> #> 1 500239555021014 Block 1014, Block Group 1, Census Tract 9555.… P1_001N 21 #> 2 500239555021015 Block 1015, Block Group 1, Census Tract 9555.… P1_001N 19 #> 3 500239555021016 Block 1016, Block Group 1, Census Tract 9555.… P1_001N 0 #> 4 500239555021017 Block 1017, Block Group 1, Census Tract 9555.… P1_001N 0 #> 5 500239555021018 Block 1018, Block Group 1, Census Tract 9555.… P1_001N 43 #> 6 500239555021019 Block 1019, Block Group 1, Census Tract 9555.… P1_001N 68 #> 7 500239555021020 Block 1020, Block Group 1, Census Tract 9555.… P1_001N 30 #> 8 500239555021021 Block 1021, Block Group 1, Census Tract 9555.… P1_001N 0 #> 9 500239555021022 Block 1022, Block Group 1, Census Tract 9555.… P1_001N 18 #> 10 500239555021023 Block 1023, Block Group 1, Census Tract 9555.… P1_001N 93 #> # … with 2,140 more rows
vt_census |> select(NAME) |> separate_wider_regex( NAME, patterns = c( 'Block ', block = '\\d ', ', ', 'Block Group ', block_group = '\\d ', ', ', 'Census Tract ', tract = '\\d .\\d ', ', ', county = '[^,] ', ', ', state = '.*' ) ) #> # A tibble: 2,150 × 5 #> block block_group tract county state #> <chr> <chr> <chr> <chr> <chr> #> 1 1014 1 9555.02 Washington County Vermont #> 2 1015 1 9555.02 Washington County Vermont #> 3 1016 1 9555.02 Washington County Vermont #> 4 1017 1 9555.02 Washington County Vermont #> 5 1018 1 9555.02 Washington County Vermont #> 6 1019 1 9555.02 Washington County Vermont #> 7 1020 1 9555.02 Washington County Vermont #> 8 1021 1 9555.02 Washington County Vermont #> 9 1022 1 9555.02 Washington County Vermont #> 10 1023 1 9555.02 Washington County Vermont #> # … with 2,140 more rows |
|
来自: 昵称69125444 > 《知识》