skimr 提供了一种无摩擦的汇总统计方法,1行代码显示汇总统计,用户可以快速浏览以理解数据。可以处理不同的数据类型并返回一个skim df 对象,该对象可以包含在管道中,阅读方式十分友好。
这个skimr 包属于一个叫ropensci的组织,有很多好用的R包都属于这个组织,大家感兴趣的话可以去官网[1]探索。 安装目前不支持从cran安装,只能通过github安装。 # install.packages("devtools") devtools::install_github("ropensci/skimr")
使用library(skimr)
使用非常简单,就是一个skim 函数,支持的描述统计比summary更多。输的内容会按照数据类型给你呈现,也可以自己定制输出内容和格式。 skim(iris)
Table: Data summary
|
|
---|
Name | iris | Number of rows | 150 | Number of columns | 5 | _______________________ |
| Column type frequency: |
| factor | 1 | numeric | 4 | ________________________ |
| Group variables | None |
Variable type: factor skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|
Species | 0 | 1 | FALSE | 3 | set: 50, ver: 50, vir: 50 |
Variable type: numeric skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|
Sepal.Length | 0 | 1 | 5.84 | 0.83 | 4.3 | 5.1 | 5.80 | 6.4 | 7.9 | ▆▇▇▅▂ | Sepal.Width | 0 | 1 | 3.06 | 0.44 | 2.0 | 2.8 | 3.00 | 3.3 | 4.4 | ▁▆▇▂▁ | Petal.Length | 0 | 1 | 3.76 | 1.77 | 1.0 | 1.6 | 4.35 | 5.1 | 6.9 | ▇▁▆▇▂ | Petal.Width | 0 | 1 | 1.20 | 0.76 | 0.1 | 0.3 | 1.30 | 1.8 | 2.5 | ▇▁▇▅▃ |
skim(dplyr::starwars)
Table: Data summary
|
|
---|
Name | dplyr::starwars | Number of rows | 87 | Number of columns | 14 | _______________________ |
| Column type frequency: |
| character | 8 | list | 3 | numeric | 3 | ________________________ |
| Group variables | None |
Variable type: character skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|
name | 0 | 1.00 | 3 | 21 | 0 | 87 | 0 | hair_color | 5 | 0.94 | 4 | 13 | 0 | 12 | 0 | skin_color | 0 | 1.00 | 3 | 19 | 0 | 31 | 0 | eye_color | 0 | 1.00 | 3 | 13 | 0 | 15 | 0 | sex | 4 | 0.95 | 4 | 14 | 0 | 4 | 0 | gender | 4 | 0.95 | 8 | 9 | 0 | 2 | 0 | homeworld | 10 | 0.89 | 4 | 14 | 0 | 48 | 0 | species | 4 | 0.95 | 3 | 14 | 0 | 37 | 0 |
Variable type: list skim_variable | n_missing | complete_rate | n_unique | min_length | max_length |
---|
films | 0 | 1 | 24 | 1 | 7 | vehicles | 0 | 1 | 11 | 0 | 2 | starships | 0 | 1 | 17 | 0 | 5 |
Variable type: numeric skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|
height | 6 | 0.93 | 174.36 | 34.77 | 66 | 167.0 | 180 | 191.0 | 264 | ▁▁▇▅▁ | mass | 28 | 0.68 | 97.31 | 169.46 | 15 | 55.6 | 79 | 84.5 | 1358 | ▇▁▁▁▁ | birth_year | 44 | 0.49 | 87.57 | 154.69 | 8 | 35.0 | 52 | 72.0 | 896 | ▇▁▁▁▁ |
当然也支持管道符: library(dplyr) ## ## 载入程辑包:'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union
iris %>% group_by(Species) %>% skim()
Table: Data summary
|
|
---|
Name | Piped data | Number of rows | 150 | Number of columns | 5 | _______________________ |
| Column type frequency: |
| numeric | 4 | ________________________ |
| Group variables | Species |
Variable type: numeric skim_variable | Species | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|
Sepal.Length | setosa | 0 | 1 | 5.01 | 0.35 | 4.3 | 4.80 | 5.00 | 5.20 | 5.8 | ▃▃▇▅▁ | Sepal.Length | versicolor | 0 | 1 | 5.94 | 0.52 | 4.9 | 5.60 | 5.90 | 6.30 | 7.0 | ▂▇▆▃▃ | Sepal.Length | virginica | 0 | 1 | 6.59 | 0.64 | 4.9 | 6.23 | 6.50 | 6.90 | 7.9 | ▁▃▇▃▂ | Sepal.Width | setosa | 0 | 1 | 3.43 | 0.38 | 2.3 | 3.20 | 3.40 | 3.68 | 4.4 | ▁▃▇▅▂ | Sepal.Width | versicolor | 0 | 1 | 2.77 | 0.31 | 2.0 | 2.52 | 2.80 | 3.00 | 3.4 | ▁▅▆▇▂ | Sepal.Width | virginica | 0 | 1 | 2.97 | 0.32 | 2.2 | 2.80 | 3.00 | 3.18 | 3.8 | ▂▆▇▅▁ | Petal.Length | setosa | 0 | 1 | 1.46 | 0.17 | 1.0 | 1.40 | 1.50 | 1.58 | 1.9 | ▁▃▇▃▁ | Petal.Length | versicolor | 0 | 1 | 4.26 | 0.47 | 3.0 | 4.00 | 4.35 | 4.60 | 5.1 | ▂▂▇▇▆ | Petal.Length | virginica | 0 | 1 | 5.55 | 0.55 | 4.5 | 5.10 | 5.55 | 5.88 | 6.9 | ▃▇▇▃▂ | Petal.Width | setosa | 0 | 1 | 0.25 | 0.11 | 0.1 | 0.20 | 0.20 | 0.30 | 0.6 | ▇▂▂▁▁ | Petal.Width | versicolor | 0 | 1 | 1.33 | 0.20 | 1.0 | 1.20 | 1.30 | 1.50 | 1.8 | ▅▇▃▆▁ | Petal.Width | virginica | 0 | 1 | 2.03 | 0.27 | 1.4 | 1.80 | 2.00 | 2.30 | 2.5 | ▂▇▆▅▇ |
定制参数尽管skimr 提供了自己的默认值,但它是高度可定制的。用户可以指定自己的统计数据,改变结果的格式,为新类创建统计数据等。 大家可以去skimr 的github[2]看看哦,不过也没有多少东西了,最主要的就是这个skim 函数。 参考资料[1]ropensci官网: https:/// [2]skimr github: https://github.com/ropensci/skimr 以上就是今天的内容,希望对你有帮助哦!欢迎点赞、在看、关注、转发!
|