Descriptive statistics for multiple variables for all grouping variable levels

grouped_summary(
  data,
  grouping.vars,
  measures = NULL,
  measures.type = "numeric",
  topcount.long = FALSE,
  k = 2L,
  ...
)

Arguments

data

Dataframe from which variables need to be taken.

grouping.vars

A list of grouping variables. Please use unquoted arguments (i.e., use x and not "x").

measures

List variables for which summary needs to computed. If not specified, all variables of type specified in the argument measures.type will be used to calculate summaries. Don't explicitly set measures.type = NULL in function call, which will produce an error because the function will try to find a column in a dataframe named "NULL".

measures.type

A character indicating whether summary for numeric ("numeric") or factor/character ("factor") variables is expected (Default: measures.type = "numeric"). This function can't be used for both numeric and variables simultaneously.

topcount.long

If measures.type = factor, you can get the top counts in long format for plotting purposes. (Default: topcount.long = FALSE).

k

Number of digits after decimal point (should be an integer) (Default: k = 3L).

...

Currently ignored.

Value

Dataframe with descriptive statistics for numeric variables (n, mean, sd, median, min, max).

Examples

# for reproducibility set.seed(123) # another possibility groupedstats::grouped_summary( data = iris, grouping.vars = Species, measures = Sepal.Length:Petal.Width, measures.type = "numeric" )
#> # A tibble: 12 x 16 #> Species skim_type skim_variable missing complete mean sd min p25 #> <fct> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 setosa numeric Sepal.Length 0 1 5.01 0.352 4.3 4.8 #> 2 setosa numeric Sepal.Width 0 1 3.43 0.379 2.3 3.2 #> 3 setosa numeric Petal.Length 0 1 1.46 0.174 1 1.4 #> 4 setosa numeric Petal.Width 0 1 0.246 0.105 0.1 0.2 #> 5 versicolor numeric Sepal.Length 0 1 5.94 0.516 4.9 5.6 #> 6 versicolor numeric Sepal.Width 0 1 2.77 0.314 2 2.52 #> 7 versicolor numeric Petal.Length 0 1 4.26 0.470 3 4 #> 8 versicolor numeric Petal.Width 0 1 1.33 0.198 1 1.2 #> 9 virginica numeric Sepal.Length 0 1 6.59 0.636 4.9 6.22 #> 10 virginica numeric Sepal.Width 0 1 2.97 0.322 2.2 2.8 #> 11 virginica numeric Petal.Length 0 1 5.55 0.552 4.5 5.1 #> 12 virginica numeric Petal.Width 0 1 2.03 0.275 1.4 1.8 #> median p75 max n std.error mean.conf.low mean.conf.high #> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> #> 1 5 5.2 5.8 50 0.0498 4.91 5.11 #> 2 3.4 3.68 4.4 50 0.0536 3.32 3.54 #> 3 1.5 1.58 1.9 50 0.0246 1.41 1.51 #> 4 0.2 0.3 0.6 50 0.0149 0.216 0.276 #> 5 5.9 6.3 7 50 0.0730 5.79 6.08 #> 6 2.8 3 3.4 50 0.0444 2.68 2.86 #> 7 4.35 4.6 5.1 50 0.0665 4.13 4.39 #> 8 1.3 1.5 1.8 50 0.0280 1.27 1.38 #> 9 6.5 6.9 7.9 50 0.0899 6.41 6.77 #> 10 3 3.18 3.8 50 0.0456 2.88 3.07 #> 11 5.55 5.88 6.9 50 0.0780 5.40 5.71 #> 12 2 2.3 2.5 50 0.0388 1.95 2.10
# if no measures are chosen, all relevant columns will be summarized groupedstats::grouped_summary( data = ggplot2::msleep, grouping.vars = vore, measures.type = "factor" )
#> # A tibble: 20 x 9 #> vore skim_type skim_variable missing complete ordered n_unique #> <fct> <chr> <chr> <int> <dbl> <lgl> <int> #> 1 carni factor name 0 1 FALSE 19 #> 2 carni factor genus 0 1 FALSE 16 #> 3 carni factor order 0 1 FALSE 6 #> 4 carni factor conservation 5 0.737 FALSE 6 #> 5 herbi factor name 0 1 FALSE 32 #> 6 herbi factor genus 0 1 FALSE 29 #> 7 herbi factor order 0 1 FALSE 9 #> 8 herbi factor conservation 6 0.812 FALSE 6 #> 9 insecti factor name 0 1 FALSE 5 #> 10 insecti factor genus 0 1 FALSE 5 #> 11 insecti factor order 0 1 FALSE 4 #> 12 insecti factor conservation 2 0.6 FALSE 2 #> 13 omni factor name 0 1 FALSE 20 #> 14 omni factor genus 0 1 FALSE 20 #> 15 omni factor order 0 1 FALSE 8 #> 16 omni factor conservation 11 0.450 FALSE 2 #> 17 NA factor name 0 1 FALSE 7 #> 18 NA factor genus 0 1 FALSE 7 #> 19 NA factor order 0 1 FALSE 5 #> 20 NA factor conservation 5 0.286 FALSE 1 #> top_counts n #> <chr> <int> #> 1 Arc: 1, Bot: 1, Cas: 1, Che: 1 19 #> 2 Pan: 3, Vul: 2, Aci: 1, Cal: 1 19 #> 3 Car: 12, Cet: 3, Cin: 1, Did: 1 19 #> 4 lc: 5, vu: 4, dom: 2, cd: 1 14 #> 5 Afr: 1, Arc: 1, Asi: 1, Bra: 1 32 #> 6 Spe: 3, Equ: 2, Apl: 1, Bos: 1 32 #> 7 Rod: 16, Art: 5, Per: 3, Hyr: 2 32 #> 8 lc: 10, dom: 7, nt: 3, vu: 3 26 #> 9 Big: 1, Eas: 1, Gia: 1, Lit: 1 5 #> 10 Ept: 1, Myo: 1, Pri: 1, Sca: 1 5 #> 11 Chi: 2, Cin: 1, Mon: 1, Sor: 1 5 #> 12 lc: 2, en: 1, cd: 0, dom: 0 3 #> 13 Afr: 1, Afr: 1, Bab: 1, Chi: 1 20 #> 14 Aot: 1, Bla: 1, Cer: 1, Con: 1 20 #> 15 Pri: 10, Sor: 3, Rod: 2, Afr: 1 20 #> 16 lc: 8, dom: 1, cd: 0, en: 0 9 #> 17 Dee: 1, Des: 1, Mol: 1, Mus: 1 7 #> 18 Cal: 1, Par: 1, Per: 1, Pha: 1 7 #> 19 Rod: 3, Dip: 1, Eri: 1, Hyr: 1 7 #> 20 lc: 2, cd: 0, dom: 0, en: 0 2
# for factors, you can also convert the dataframe to a long format with counts groupedstats::grouped_summary( data = ggplot2::msleep, grouping.vars = c(vore), measures = c(genus:order), measures.type = "factor", topcount.long = TRUE )
#> # A tibble: 40 x 3 #> vore factor.level count #> <fct> <chr> <int> #> 1 carni Pan 3 #> 2 carni Vul 2 #> 3 carni Aci 1 #> 4 carni Cal 1 #> 5 carni Car 12 #> 6 carni Cet 3 #> 7 carni Cin 1 #> 8 carni Did 1 #> 9 herbi Spe 3 #> 10 herbi Equ 2 #> # ... with 30 more rows