Skip to contents

  Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes
  and Expressions with Statistical Details. Journal of Open Source
  Software, 6(61), 3236, https://doi.org/10.21105/joss.03236

A BibTeX entry for LaTeX users is

  @Article{,
    doi = {10.21105/joss.03236},
    url = {https://doi.org/10.21105/joss.03236},
    year = {2021},
    publisher = {{The Open Journal}},
    volume = {6},
    number = {61},
    pages = {3236},
    author = {Indrajeet Patil},
    title = {{statsExpressions: {R} Package for Tidy Dataframes and Expressions with Statistical Details}},
    journal = {{Journal of Open Source Software}},
  }

Introduction

The statsExpressions package has two key aims: to provide a consistent syntax to do statistical analysis with tidy data, and to provide statistical expressions (i.e., a pre-formatted in-text statistical result) for plotting functions. Currently, it supports common types of statistical approaches and tests: parametric, nonparametric, robust, and Bayesian t-test, one-way ANOVA, correlation analyses, contingency table analyses, and meta-analyses. The functions are pipe-friendly and compatible with tidy data.

Statistical packages exhibit substantial diversity in terms of their syntax and expected input type. This can make it difficult to switch from one statistical approach to another. For example, some functions expect vectors as inputs, while others expect dataframes. Depending on whether it is a repeated measures design or not, different functions might expect data to be in wide or long format. Some functions can internally omit missing values, while other functions error in their presence. Furthermore, if someone wishes to utilize the objects returned by these packages downstream in their workflow, this is not straightforward either because even functions from the same package can return a list, a matrix, an array, a dataframe, etc., depending on the function.

This is where statsExpressions comes in: It can be thought of as a unified portal through which most of the functionality in these underlying packages can be accessed, with a simpler interface and no requirement to change data format.

Dataframes

The dataframe will contain the following columns (the exact columns will depend on the test and the statistical approach):

  • statistic: the numeric value of a statistic.

  • df: the numeric value of a parameter being modeled (often degrees of freedom for the test).

  • df.error and df: relevant only if the statistic in question has two degrees of freedom (e.g., anova).

  • p.value: the two-sided p-value associated with the observed statistic.

  • method: the details of the statistical test carried out.

  • estimate: estimated value of the effect size.

  • conf.low: lower bound for the effect size estimate.

  • conf.high: upper bound for the effect size estimate.

  • conf.level: width of the confidence interval.

  • effectsize: the type of the effect size.

For details about the tests and types of effect size estimates they correspond to, see: https://indrajeetpatil.github.io/statsExpressions/articles/stats_details.html

Note that all examples are preceded by set.seed() calls for reproducibility.

One-sample tests

# setup
library(statsExpressions)

# ----------------------- parametric ---------------------------------------

set.seed(123)
one_sample_test(
  data       = ggplot2::msleep,
  x          = brainwt,
  test.value = 0.275,
  type       = "parametric"
)
#> # A tibble: 1 × 15
#>      mu statistic df.error p.value method            alternative effectsize
#>   <dbl>     <dbl>    <dbl>   <dbl> <chr>             <chr>       <chr>     
#> 1 0.275    0.0504       55   0.960 One Sample t-test two.sided   Hedges' g 
#>   estimate conf.level conf.low conf.high conf.method conf.distribution n.obs
#>      <dbl>      <dbl>    <dbl>     <dbl> <chr>       <chr>             <int>
#> 1  0.00665       0.95   -0.254     0.267 ncp         t                    56
#>   expression  
#>   <list>      
#> 1 <expression>

# ----------------------- non-parametric -----------------------------------

set.seed(123)
one_sample_test(
  data       = ggplot2::msleep,
  x          = brainwt,
  test.value = 0.275,
  type       = "nonparametric"
)
#> # A tibble: 1 × 12
#>   statistic    p.value method                    alternative effectsize       
#>       <dbl>      <dbl> <chr>                     <chr>       <chr>            
#> 1       248 0.00000738 Wilcoxon signed rank test two.sided   r (rank biserial)
#>   estimate conf.level conf.low conf.high conf.method n.obs expression  
#>      <dbl>      <dbl>    <dbl>     <dbl> <chr>       <int> <list>      
#> 1   -0.689       0.95   -0.817    -0.497 normal         56 <expression>

# ----------------------- robust --------------------------------------------

set.seed(123)
one_sample_test(
  data       = ggplot2::msleep,
  x          = brainwt,
  test.value = 0.275,
  type       = "robust"
)
#> # A tibble: 1 × 10
#>   statistic p.value n.obs method                                 effectsize  
#>       <dbl>   <dbl> <int> <chr>                                  <chr>       
#> 1     -14.7       0    56 Bootstrap-t method for one-sample test Trimmed mean
#>   estimate conf.level conf.low conf.high expression  
#>      <dbl>      <dbl>    <dbl>     <dbl> <list>      
#> 1   0.0390       0.95  -0.0132    0.0911 <expression>

# ----------------------- Bayesian ---------------------------------------

set.seed(123)
one_sample_test(
  data       = ggplot2::msleep,
  x          = brainwt,
  test.value = 0.275,
  type       = "bayes",
  bf.prior   = 0.8
)
#> # A tibble: 1 × 16
#>   term       effectsize      estimate conf.level conf.low conf.high    pd
#>   <chr>      <chr>              <dbl>      <dbl>    <dbl>     <dbl> <dbl>
#> 1 Difference Bayesian t-test -0.00952       0.95   -0.255     0.250 0.532
#>   rope.percentage prior.distribution prior.location prior.scale  bf10
#>             <dbl> <chr>                       <dbl>       <dbl> <dbl>
#> 1           0.580 cauchy                          0         0.8 0.130
#>   method          log_e_bf10 n.obs expression  
#>   <chr>                <dbl> <int> <list>      
#> 1 Bayesian t-test      -2.04    56 <expression>

Two-sample tests

within-subjects design

# setup
library(statsExpressions)

# data
df <- dplyr::filter(bugs_long, condition %in% c("LDLF", "LDHF"))

# ----------------------- parametric ---------------------------------------

set.seed(123)
two_sample_test(
  data       = df,
  x          = condition,
  y          = desire,
  paired     = TRUE,
  subject.id = subject,
  type       = "p"
)
#> # A tibble: 1 × 16
#>   term   group     statistic df.error       p.value method        alternative
#>   <chr>  <chr>         <dbl>    <dbl>         <dbl> <chr>         <chr>      
#> 1 desire condition      6.65       90 0.00000000222 Paired t-test two.sided  
#>   effectsize estimate conf.level conf.low conf.high conf.method
#>   <chr>         <dbl>      <dbl>    <dbl>     <dbl> <chr>      
#> 1 Hedges' g     0.691       0.95    0.465     0.922 ncp        
#>   conf.distribution n.obs expression  
#>   <chr>             <int> <list>      
#> 1 t                    91 <expression>

# ----------------------- non-parametric -----------------------------------

set.seed(123)
two_sample_test(
  data       = df,
  x          = condition,
  y          = desire,
  paired     = TRUE,
  subject.id = subject,
  type       = "np"
)
#> # A tibble: 1 × 14
#>   parameter1 parameter2 statistic      p.value method                   
#>   <chr>      <chr>          <dbl>        <dbl> <chr>                    
#> 1 desire     condition      2250. 0.0000000241 Wilcoxon signed rank test
#>   alternative effectsize        estimate conf.level conf.low conf.high
#>   <chr>       <chr>                <dbl>      <dbl>    <dbl>     <dbl>
#> 1 two.sided   r (rank biserial)    0.761       0.95    0.642     0.844
#>   conf.method n.obs expression  
#>   <chr>       <int> <list>      
#> 1 normal         91 <expression>

# ----------------------- robust --------------------------------------------

set.seed(123)
two_sample_test(
  data       = df,
  x          = condition,
  y          = desire,
  paired     = TRUE,
  subject.id = subject,
  type       = "r"
)
#> # A tibble: 1 × 15
#>   statistic df.error      p.value
#>       <dbl>    <dbl>        <dbl>
#> 1      6.46       54 0.0000000313
#>   method                                            
#>   <chr>                                             
#> 1 Yuen's test on trimmed means for dependent samples
#>   effectsize                                              estimate conf.level
#>   <chr>                                                      <dbl>      <dbl>
#> 1 Algina-Keselman-Penfield robust standardized difference    0.533       0.95
#>   conf.low conf.high    mu small medium large n.obs expression  
#>      <dbl>     <dbl> <dbl> <dbl>  <dbl> <dbl> <int> <list>      
#> 1    0.369     0.707     0   0.1    0.3   0.5    91 <expression>

# ----------------------- Bayesian ---------------------------------------

set.seed(123)
two_sample_test(
  data       = df,
  x          = condition,
  y          = desire,
  paired     = TRUE,
  subject.id = subject,
  type       = "bayes"
)
#> # A tibble: 1 × 16
#>   term       effectsize      estimate conf.level conf.low conf.high    pd
#>   <chr>      <chr>              <dbl>      <dbl>    <dbl>     <dbl> <dbl>
#> 1 Difference Bayesian t-test     1.62       0.95     1.14      2.12     1
#>   rope.percentage prior.distribution prior.location prior.scale     bf10
#>             <dbl> <chr>                       <dbl>       <dbl>    <dbl>
#> 1               0 cauchy                          0       0.707 4762370.
#>   method          log_e_bf10 n.obs expression  
#>   <chr>                <dbl> <int> <list>      
#> 1 Bayesian t-test       15.4    91 <expression>

between-subjects design

# setup
set.seed(123)
library(statsExpressions)

# ----------------------- parametric ---------------------------------------

# unequal variance
set.seed(123)
two_sample_test(
  data      = ToothGrowth,
  x         = supp,
  y         = len,
  type      = "p"
)
#> # A tibble: 1 × 18
#>   term  group mean.group1 mean.group2 statistic df.error p.value
#>   <chr> <chr>       <dbl>       <dbl>     <dbl>    <dbl>   <dbl>
#> 1 len   supp         20.7        17.0      1.92     55.3  0.0606
#>   method                  alternative effectsize estimate conf.level conf.low
#>   <chr>                   <chr>       <chr>         <dbl>      <dbl>    <dbl>
#> 1 Welch Two Sample t-test two.sided   Hedges' g     0.488       0.95  -0.0217
#>   conf.high conf.method conf.distribution n.obs expression  
#>       <dbl> <chr>       <chr>             <int> <list>      
#> 1     0.993 ncp         t                    60 <expression>

# equal variance
set.seed(123)
two_sample_test(
  data      = ToothGrowth,
  x         = supp,
  y         = len,
  var.equal = TRUE,
  type      = "p"
)
#> # A tibble: 1 × 18
#>   term  group mean.group1 mean.group2 statistic df.error p.value
#>   <chr> <chr>       <dbl>       <dbl>     <dbl>    <dbl>   <dbl>
#> 1 len   supp         20.7        17.0      1.92       58  0.0604
#>   method            alternative effectsize estimate conf.level conf.low
#>   <chr>             <chr>       <chr>         <dbl>      <dbl>    <dbl>
#> 1 Two Sample t-test two.sided   Hedges' g     0.488       0.95  -0.0217
#>   conf.high conf.method conf.distribution n.obs expression  
#>       <dbl> <chr>       <chr>             <int> <list>      
#> 1     0.993 ncp         t                    60 <expression>

# ----------------------- non-parametric -----------------------------------

set.seed(123)
two_sample_test(
  data      = ToothGrowth,
  x         = supp,
  y         = len,
  type      = "np"
)
#> # A tibble: 1 × 14
#>   parameter1 parameter2 statistic p.value method                 alternative
#>   <chr>      <chr>          <dbl>   <dbl> <chr>                  <chr>      
#> 1 len        supp            576.  0.0645 Wilcoxon rank sum test two.sided  
#>   effectsize        estimate conf.level conf.low conf.high conf.method n.obs
#>   <chr>                <dbl>      <dbl>    <dbl>     <dbl> <chr>       <int>
#> 1 r (rank biserial)    0.279       0.95 -0.00812     0.523 normal         60
#>   expression  
#>   <list>      
#> 1 <expression>

# ----------------------- robust --------------------------------------------

set.seed(123)
two_sample_test(
  data      = ToothGrowth,
  x         = supp,
  y         = len,
  type      = "r"
)
#> # A tibble: 1 × 11
#>   statistic df.error p.value
#>       <dbl>    <dbl>   <dbl>
#> 1      2.29     33.5  0.0286
#>   method                                              
#>   <chr>                                               
#> 1 Yuen's test on trimmed means for independent samples
#>   effectsize                                              estimate conf.level
#>   <chr>                                                      <dbl>      <dbl>
#> 1 Algina-Keselman-Penfield robust standardized difference    0.683       0.95
#>   conf.low conf.high n.obs expression  
#>      <dbl>     <dbl> <int> <list>      
#> 1 -0.00736      2.36    60 <expression>

# ----------------------- Bayesian ---------------------------------------

set.seed(123)
two_sample_test(
  data      = ToothGrowth,
  x         = supp,
  y         = len,
  type      = "bayes"
)
#> # A tibble: 1 × 16
#>   term       effectsize      estimate conf.level conf.low conf.high    pd
#>   <chr>      <chr>              <dbl>      <dbl>    <dbl>     <dbl> <dbl>
#> 1 Difference Bayesian t-test     3.16       0.95   -0.528      6.66 0.960
#>   rope.percentage prior.distribution prior.location prior.scale  bf10
#>             <dbl> <chr>                       <dbl>       <dbl> <dbl>
#> 1          0.0821 cauchy                          0       0.707  1.20
#>   method          log_e_bf10 n.obs expression  
#>   <chr>                <dbl> <int> <list>      
#> 1 Bayesian t-test      0.181    60 <expression>

One-way ANOVAs

within-subjects design

# setup
library(statsExpressions)

# ----------------------- parametric ---------------------------------------

if (require("afex", quietly = TRUE)) {
  set.seed(123)
  oneway_anova(
    data       = bugs_long,
    x          = condition,
    y          = desire,
    paired     = TRUE,
    subject.id = subject,
    type       = "p"
  )
}
#> # A tibble: 1 × 18
#>   term      sumsq sum.squares.error    df df.error meansq statistic  p.value
#>   <chr>     <dbl>             <dbl> <dbl>    <dbl>  <dbl>     <dbl>    <dbl>
#> 1 condition  233.              984.  2.63     229.   4.30      20.6 8.27e-11
#>   method                                              effectsize       estimate
#>   <chr>                                               <chr>               <dbl>
#> 1 ANOVA estimation for factorial designs using 'afex' Omega2 (partial)   0.0783
#>   conf.level conf.low conf.high conf.method conf.distribution n.obs expression  
#>        <dbl>    <dbl>     <dbl> <chr>       <chr>             <int> <list>      
#> 1       0.95   0.0280         1 ncp         F                    88 <expression>

# ----------------------- non-parametric -----------------------------------

set.seed(123)
oneway_anova(
  data       = bugs_long,
  x          = condition,
  y          = desire,
  paired     = TRUE,
  subject.id = subject,
  type       = "np"
)
#> # A tibble: 1 × 15
#>   parameter1 parameter2 statistic df.error  p.value method                
#>   <chr>      <chr>          <dbl>    <dbl>    <dbl> <chr>                 
#> 1 desire     condition       55.8        3 4.56e-12 Friedman rank sum test
#>   effectsize  estimate conf.level conf.low conf.high conf.method         
#>   <chr>          <dbl>      <dbl>    <dbl>     <dbl> <chr>               
#> 1 Kendall's W    0.175       0.95    0.125         1 percentile bootstrap
#>   conf.iterations n.obs expression  
#>             <int> <int> <list>      
#> 1             100    88 <expression>

# ----------------------- robust --------------------------------------------

set.seed(123)
oneway_anova(
  data       = bugs_long,
  x          = condition,
  y          = desire,
  paired     = TRUE,
  subject.id = subject,
  type       = "r"
)
#> # A tibble: 1 × 12
#>   statistic    df df.error  p.value
#>       <dbl> <dbl>    <dbl>    <dbl>
#> 1      21.0  2.73     145. 1.15e-10
#>   method                                                             
#>   <chr>                                                              
#> 1 A heteroscedastic one-way repeated measures ANOVA for trimmed means
#>   effectsize                                                      estimate
#>   <chr>                                                              <dbl>
#> 1 Algina-Keselman-Penfield robust standardized difference average    0.664
#>   conf.level conf.low conf.high n.obs expression  
#>        <dbl>    <dbl>     <dbl> <int> <list>      
#> 1       0.95    0.466     0.971    88 <expression>

# ----------------------- Bayesian ---------------------------------------

set.seed(123)
oneway_anova(
  data       = bugs_long,
  x          = condition,
  y          = desire,
  paired     = TRUE,
  subject.id = subject,
  type       = "bayes"
)
#> # A tibble: 8 × 19
#>   term              pd rope.percentage prior.distribution prior.location
#>   <chr>          <dbl>           <dbl> <chr>                       <dbl>
#> 1 mu             1               0     cauchy                          0
#> 2 condition-HDHF 1               0     cauchy                          0
#> 3 condition-HDLF 0.862           0.715 cauchy                          0
#> 4 condition-LDHF 0.995           0.139 cauchy                          0
#> 5 condition-LDLF 1               0     cauchy                          0
#> 6 sig2           1               0     cauchy                          0
#> 7 g_condition    1               0.401 cauchy                          0
#> 8 g_rowid        1               0     cauchy                          0
#>   prior.scale effect        bf10 method                          log_e_bf10
#>         <dbl> <chr>        <dbl> <chr>                                <dbl>
#> 1       0.707 fixed  1372773377. Bayes factors for linear models       21.0
#> 2       0.707 fixed  1372773377. Bayes factors for linear models       21.0
#> 3       0.707 fixed  1372773377. Bayes factors for linear models       21.0
#> 4       0.707 fixed  1372773377. Bayes factors for linear models       21.0
#> 5       0.707 fixed  1372773377. Bayes factors for linear models       21.0
#> 6       1     fixed  1372773377. Bayes factors for linear models       21.0
#> 7       1     fixed  1372773377. Bayes factors for linear models       21.0
#> 8       1     fixed  1372773377. Bayes factors for linear models       21.0
#>   effectsize         estimate std.dev conf.level conf.low conf.high component  
#>   <chr>                 <dbl>   <dbl>      <dbl>    <dbl>     <dbl> <chr>      
#> 1 Bayesian R-squared    0.529  0.0330       0.95    0.460     0.586 conditional
#> 2 Bayesian R-squared    0.529  0.0330       0.95    0.460     0.586 conditional
#> 3 Bayesian R-squared    0.529  0.0330       0.95    0.460     0.586 conditional
#> 4 Bayesian R-squared    0.529  0.0330       0.95    0.460     0.586 conditional
#> 5 Bayesian R-squared    0.529  0.0330       0.95    0.460     0.586 conditional
#> 6 Bayesian R-squared    0.529  0.0330       0.95    0.460     0.586 conditional
#> 7 Bayesian R-squared    0.529  0.0330       0.95    0.460     0.586 conditional
#> 8 Bayesian R-squared    0.529  0.0330       0.95    0.460     0.586 conditional
#>   n.obs expression  
#>   <int> <list>      
#> 1    88 <expression>
#> 2    88 <expression>
#> 3    88 <expression>
#> 4    88 <expression>
#> 5    88 <expression>
#> 6    88 <expression>
#> 7    88 <expression>
#> 8    88 <expression>

between-subjects design

## setup
library(statsExpressions)

# ----------------------- parametric ---------------------------------------

# unequal variance
set.seed(123)
oneway_anova(
  data      = iris,
  x         = Species,
  y         = Sepal.Length,
  type      = "p"
)
#> # A tibble: 1 × 14
#>   statistic    df df.error  p.value
#>       <dbl> <dbl>    <dbl>    <dbl>
#> 1      139.     2     92.2 1.51e-28
#>   method                                                   effectsize estimate
#>   <chr>                                                    <chr>         <dbl>
#> 1 One-way analysis of means (not assuming equal variances) Omega2        0.743
#>   conf.level conf.low conf.high conf.method conf.distribution n.obs expression  
#>        <dbl>    <dbl>     <dbl> <chr>       <chr>             <int> <list>      
#> 1       0.95    0.671         1 ncp         F                   150 <expression>

# equal variance
set.seed(123)
oneway_anova(
  data      = iris,
  x         = Species,
  y         = Sepal.Length,
  var.equal = TRUE,
  type      = "p"
)
#> # A tibble: 1 × 14
#>   statistic    df df.error  p.value method                    effectsize
#>       <dbl> <dbl>    <dbl>    <dbl> <chr>                     <chr>     
#> 1      119.     2      147 1.67e-31 One-way analysis of means Omega2    
#>   estimate conf.level conf.low conf.high conf.method conf.distribution n.obs
#>      <dbl>      <dbl>    <dbl>     <dbl> <chr>       <chr>             <int>
#> 1    0.612       0.95    0.534         1 ncp         F                   150
#>   expression  
#>   <list>      
#> 1 <expression>

# ----------------------- non-parametric -----------------------------------

set.seed(123)
oneway_anova(
  data      = iris,
  x         = Species,
  y         = Sepal.Length,
  type      = "np"
)
#> # A tibble: 1 × 15
#>   parameter1   parameter2 statistic df.error  p.value
#>   <chr>        <chr>          <dbl>    <int>    <dbl>
#> 1 Sepal.Length Species         96.9        2 8.92e-22
#>   method                       effectsize      estimate conf.level conf.low
#>   <chr>                        <chr>              <dbl>      <dbl>    <dbl>
#> 1 Kruskal-Wallis rank sum test Epsilon2 (rank)    0.651       0.95    0.595
#>   conf.high conf.method          conf.iterations n.obs expression  
#>       <dbl> <chr>                          <int> <int> <list>      
#> 1         1 percentile bootstrap             100   150 <expression>

# ----------------------- robust --------------------------------------------

set.seed(123)
oneway_anova(
  data      = iris,
  x         = Species,
  y         = Sepal.Length,
  type      = "r"
)
#> # A tibble: 1 × 12
#>   statistic    df df.error p.value
#>       <dbl> <dbl>    <dbl>   <dbl>
#> 1      112.     2     53.8       0
#>   method                                           
#>   <chr>                                            
#> 1 A heteroscedastic one-way ANOVA for trimmed means
#>   effectsize                         estimate conf.level conf.low conf.high
#>   <chr>                                 <dbl>      <dbl>    <dbl>     <dbl>
#> 1 Explanatory measure of effect size    0.846       0.95    0.759     0.933
#>   n.obs expression  
#>   <int> <list>      
#> 1   150 <expression>

# ----------------------- Bayesian ---------------------------------------

set.seed(123)
oneway_anova(
  data      = iris,
  x         = Species,
  y         = Sepal.Length,
  type      = "bayes"
)
#> # A tibble: 6 × 17
#>   term                  pd rope.percentage prior.distribution prior.location
#>   <chr>              <dbl>           <dbl> <chr>                       <dbl>
#> 1 mu                 1               0     cauchy                          0
#> 2 Species-setosa     1               0     cauchy                          0
#> 3 Species-versicolor 0.936           0.435 cauchy                          0
#> 4 Species-virginica  1               0     cauchy                          0
#> 5 sig2               1               0     cauchy                          0
#> 6 g_Species          1               0     cauchy                          0
#>   prior.scale    bf10 method                          log_e_bf10
#>         <dbl>   <dbl> <chr>                                <dbl>
#> 1       0.707 1.87e28 Bayes factors for linear models       65.1
#> 2       0.707 1.87e28 Bayes factors for linear models       65.1
#> 3       0.707 1.87e28 Bayes factors for linear models       65.1
#> 4       0.707 1.87e28 Bayes factors for linear models       65.1
#> 5       0.707 1.87e28 Bayes factors for linear models       65.1
#> 6       0.707 1.87e28 Bayes factors for linear models       65.1
#>   effectsize         estimate std.dev conf.level conf.low conf.high n.obs
#>   <chr>                 <dbl>   <dbl>      <dbl>    <dbl>     <dbl> <int>
#> 1 Bayesian R-squared    0.612  0.0311       0.95    0.544     0.667   150
#> 2 Bayesian R-squared    0.612  0.0311       0.95    0.544     0.667   150
#> 3 Bayesian R-squared    0.612  0.0311       0.95    0.544     0.667   150
#> 4 Bayesian R-squared    0.612  0.0311       0.95    0.544     0.667   150
#> 5 Bayesian R-squared    0.612  0.0311       0.95    0.544     0.667   150
#> 6 Bayesian R-squared    0.612  0.0311       0.95    0.544     0.667   150
#>   expression  
#>   <list>      
#> 1 <expression>
#> 2 <expression>
#> 3 <expression>
#> 4 <expression>
#> 5 <expression>
#> 6 <expression>

Contingency table analyses

association test

set.seed(123)
library(statsExpressions)

# ------------------------ frequentist -----------------------------

# unpaired
set.seed(123)
contingency_table(
  data   = mtcars,
  x      = am,
  y      = cyl,
  paired = FALSE
)
#> # A tibble: 1 × 13
#>   statistic    df p.value method                     effectsize        estimate
#>       <dbl> <int>   <dbl> <chr>                      <chr>                <dbl>
#> 1      8.74     2  0.0126 Pearson's Chi-squared test Cramer's V (adj.)    0.464
#>   conf.level conf.low conf.high conf.method conf.distribution n.obs expression  
#>        <dbl>    <dbl>     <dbl> <chr>       <chr>             <int> <list>      
#> 1       0.95        0         1 ncp         chisq                32 <expression>

# paired
## create data structure
paired_data <-
  structure(
    list(
      response_before =
        structure(
          c(1L, 2L, 1L, 2L),
          .Label = c("no", "yes"),
          class = "factor"
        ),
      response_after = structure(
        c(1L, 1L, 2L, 2L),
        .Label = c("no", "yes"),
        class = "factor"
      ),
      Freq = c(65L, 25L, 5L, 5L)
    ),
    class = "data.frame",
    row.names = c(NA, -4L)
  )


set.seed(123)
contingency_table(
  data   = paired_data,
  x      = response_before,
  y      = response_after,
  paired = TRUE,
  counts = "Freq"
)
#> # A tibble: 1 × 12
#>   statistic    df  p.value method                     effectsize estimate
#>       <dbl> <dbl>    <dbl> <chr>                      <chr>         <dbl>
#> 1      13.3     1 0.000261 McNemar's Chi-squared test Cohen's g     0.333
#>   conf.level conf.low conf.high conf.method n.obs expression  
#>        <dbl>    <dbl>     <dbl> <chr>       <int> <list>      
#> 1       0.95    0.164     0.427 binomial      100 <expression>

# ------------------------ Bayesian -----------------------------

# unpaired
set.seed(123)
contingency_table(
  data   = mtcars,
  x      = am,
  y      = cyl,
  paired = FALSE,
  type   = "bayes"
)
#> # A tibble: 1 × 14
#>   term  conf.level effectsize estimate conf.low conf.high
#>   <chr>      <dbl> <chr>         <dbl>    <dbl>     <dbl>
#> 1 Ratio       0.95 Cramers_v     0.478    0.221     0.725
#>   prior.distribution      prior.location prior.scale  bf10
#>   <chr>                            <dbl>       <dbl> <dbl>
#> 1 independent multinomial              0           1  16.8
#>   method                              log_e_bf10 n.obs expression  
#>   <chr>                                    <dbl> <int> <list>      
#> 1 Bayesian contingency table analysis       2.82    32 <expression>

goodness-of-fit tests

# ------------------------ frequentist -----------------------------

# with counts
set.seed(123)
contingency_table(
  data   = as.data.frame(HairEyeColor),
  x      = Eye,
  counts = Freq,
)
#> # A tibble: 1 × 13
#>   statistic    df  p.value method                                   effectsize 
#>       <dbl> <dbl>    <dbl> <chr>                                    <chr>      
#> 1      133.     3 9.65e-29 Chi-squared test for given probabilities Pearson's C
#>   estimate conf.level conf.low conf.high conf.method conf.distribution n.obs
#>      <dbl>      <dbl>    <dbl>     <dbl> <chr>       <chr>             <int>
#> 1    0.429       0.95    0.374         1 ncp         chisq               592
#>   expression  
#>   <list>      
#> 1 <expression>

# ------------------------ Bayesian -----------------------------

set.seed(123)
contingency_table(
  data   = as.data.frame(HairEyeColor),
  x      = Eye,
  counts = Freq,
  ratio  = c(0.2, 0.2, 0.3, 0.3),
  type   = "bayes"
)
#> # A tibble: 1 × 4
#>      bf10 prior.scale method                                      expression  
#>     <dbl>       <dbl> <chr>                                       <list>      
#> 1 4.17e55           1 Bayesian one-way contingency table analysis <expression>

Correlation analyses

set.seed(123)
library(statsExpressions)

# ----------------------- parametric ---------------------------------------

set.seed(123)
corr_test(
  data = mtcars,
  x    = wt,
  y    = mpg,
  type = "parametric"
)
#> # A tibble: 1 × 13
#>   parameter1 parameter2 effectsize          estimate conf.level conf.low
#>   <chr>      <chr>      <chr>                  <dbl>      <dbl>    <dbl>
#> 1 wt         mpg        Pearson correlation   -0.868       0.95   -0.934
#>   conf.high statistic df.error  p.value method              n.obs expression  
#>       <dbl>     <dbl>    <int>    <dbl> <chr>               <int> <list>      
#> 1    -0.744     -9.56       30 1.29e-10 Pearson correlation    32 <expression>

# ----------------------- non-parametric -----------------------------------

set.seed(123)
corr_test(
  data = mtcars,
  x    = wt,
  y    = mpg,
  type = "nonparametric"
)
#> # A tibble: 1 × 12
#>   parameter1 parameter2 effectsize           estimate conf.level conf.low
#>   <chr>      <chr>      <chr>                   <dbl>      <dbl>    <dbl>
#> 1 wt         mpg        Spearman correlation   -0.886       0.95   -0.945
#>   conf.high statistic  p.value method               n.obs expression  
#>       <dbl>     <dbl>    <dbl> <chr>                <int> <list>      
#> 1    -0.774    10292. 1.49e-11 Spearman correlation    32 <expression>

# ----------------------- robust --------------------------------------------

set.seed(123)
corr_test(
  data = mtcars,
  x    = wt,
  y    = mpg,
  type = "robust"
)
#> # A tibble: 1 × 13
#>   parameter1 parameter2 effectsize                     estimate conf.level
#>   <chr>      <chr>      <chr>                             <dbl>      <dbl>
#> 1 wt         mpg        Winsorized Pearson correlation   -0.864       0.95
#>   conf.low conf.high statistic df.error  p.value method                        
#>      <dbl>     <dbl>     <dbl>    <int>    <dbl> <chr>                         
#> 1   -0.932    -0.738     -9.41       30 1.84e-10 Winsorized Pearson correlation
#>   n.obs expression  
#>   <int> <list>      
#> 1    32 <expression>

# ----------------------- Bayesian ---------------------------------------

set.seed(123)
corr_test(
  data = mtcars,
  x    = wt,
  y    = mpg,
  type = "bayes"
)
#> # A tibble: 1 × 16
#>   parameter1 parameter2 effectsize                   estimate conf.level
#>   <chr>      <chr>      <chr>                           <dbl>      <dbl>
#> 1 wt         mpg        Bayesian Pearson correlation   -0.843       0.95
#>   conf.low conf.high    pd rope.percentage prior.distribution prior.location
#>      <dbl>     <dbl> <dbl>           <dbl> <chr>                       <dbl>
#> 1   -0.934    -0.734     1               0 beta                         1.41
#>   prior.scale      bf10 method                       n.obs expression  
#>         <dbl>     <dbl> <chr>                        <int> <list>      
#> 1        1.41 56223033. Bayesian Pearson correlation    32 <expression>

Meta-analysis

set.seed(123)
library(statsExpressions)
library(metaplus)

# renaming to what `{statsExpressions}` expects
df <- dplyr::rename(mag, estimate = yi, std.error = sei)

# ----------------------- parametric ---------------------------------------

set.seed(123)
meta_analysis(df, type = "parametric")
#> # A tibble: 1 × 12
#>   term    effectsize                     estimate std.error conf.level conf.low
#>   <chr>   <chr>                             <dbl>     <dbl>      <dbl>    <dbl>
#> 1 Overall meta-analytic summary estimate   -0.767     0.212       0.95    -1.18
#>   conf.high statistic  p.value method                        n.obs expression  
#>       <dbl>     <dbl>    <dbl> <chr>                         <int> <list>      
#> 1    -0.351     -3.62 0.000295 Meta-analysis using 'metafor'    16 <expression>

# ----------------------- robust --------------------------------------------

set.seed(123)
meta_analysis(df, type = "robust")
#> # A tibble: 1 × 12
#>   term    effectsize                     estimate std.error conf.low conf.high
#>   <chr>   <chr>                             <dbl>     <dbl>    <dbl>     <dbl>
#> 1 Overall meta-analytic summary estimate   -0.746     0.233    -1.26    -0.344
#>   statistic  p.value conf.level method                                n.obs
#>       <dbl>    <dbl>      <dbl> <chr>                                 <int>
#> 1     -3.20 0.000501       0.95 Robust meta-analysis using 'metaplus'    16
#>   expression  
#>   <list>      
#> 1 <expression>

# ----------------------- Bayesian ---------------------------------------

set.seed(123)
meta_analysis(df, type = "bayes")
#> # A tibble: 2 × 18
#>   term    effectsize                       estimate std.error conf.level
#>   <chr>   <chr>                               <dbl>     <dbl>      <dbl>
#> 1 Overall meta-analytic posterior estimate   -0.649     0.224       0.95
#> 2 tau     meta-analytic posterior estimate    0.487     0.182       0.95
#>   conf.low conf.high  bf10  rhat   ess component prior.distribution
#>      <dbl>     <dbl> <dbl> <dbl> <dbl> <chr>     <chr>             
#> 1   -1.09     -0.215  53.0  1.00 3771. meta      Student's t       
#> 2    0.185     0.862  53.0  1.00 3692. meta      Inverse gamma     
#>   prior.location prior.scale method                                 log_e_bf10
#>            <dbl>       <dbl> <chr>                                       <dbl>
#> 1              0       0.707 Bayesian meta-analysis using 'metaBMA'       3.97
#> 2              1       0.15  Bayesian meta-analysis using 'metaBMA'       3.97
#>   n.obs expression  
#>   <int> <list>      
#> 1    16 <expression>
#> 2    16 <expression>

Centrality description

# ----------------------- parametric -----------------------

set.seed(123)
centrality_description(iris, Species, Sepal.Length)
#> # A tibble: 3 × 15
#>   Species    Sepal.Length n_obs variable     std.dev   iqr conf.low conf.high
#>   <fct>             <dbl> <int> <chr>          <dbl> <dbl>    <dbl>     <dbl>
#> 1 setosa             5.01    50 Sepal.Length   0.352 0.400     4.90      5.10
#> 2 versicolor         5.94    50 Sepal.Length   0.516 0.7       5.80      6.07
#> 3 virginica          6.59    50 Sepal.Length   0.636 0.750     6.39      6.79
#>     min   max skewness kurtosis n.missing expression               
#>   <dbl> <dbl>    <dbl>    <dbl>     <int> <glue>                   
#> 1   4.3   5.8    0.120  -0.253          0 widehat(mu)[mean]=='5.01'
#> 2   4.9   7      0.105  -0.533          0 widehat(mu)[mean]=='5.94'
#> 3   4.9   7.9    0.118   0.0329         0 widehat(mu)[mean]=='6.59'
#>   n_label               
#>   <chr>                 
#> 1 "setosa\n(n = 50)"    
#> 2 "versicolor\n(n = 50)"
#> 3 "virginica\n(n = 50)"

# ----------------------- non-parametric -------------------

set.seed(123)
centrality_description(mtcars, am, wt, type = "n")
#> # A tibble: 2 × 15
#>   am       wt n_obs variable   mad   iqr conf.low conf.high   min   max skewness
#>   <fct> <dbl> <int> <chr>    <dbl> <dbl>    <dbl>     <dbl> <dbl> <dbl>    <dbl>
#> 1 0      3.52    19 wt       0.452 0.41      3.44      3.84  2.46  5.42    1.15 
#> 2 1      2.32    13 wt       0.682 0.942     1.94      2.78  1.51  3.57    0.269
#>   kurtosis n.missing expression                  n_label      
#>      <dbl>     <int> <glue>                      <chr>        
#> 1    1.06          0 widehat(mu)[median]=='3.52' "0\n(n = 19)"
#> 2   -0.654         0 widehat(mu)[median]=='2.32' "1\n(n = 13)"

# ----------------------- robust ---------------------------

set.seed(123)
centrality_description(ToothGrowth, supp, len, type = "r")
#> # A tibble: 2 × 15
#>   supp    len n_obs variable std.dev   iqr conf.low conf.high   min   max
#>   <fct> <dbl> <int> <chr>      <dbl> <dbl>    <dbl>     <dbl> <dbl> <dbl>
#> 1 OJ     21.7    30 len         6.61  10.9     18.1      23.6   8.2  30.9
#> 2 VC     16.6    30 len         8.27  12.5     13.6      19.9   4.2  33.9
#>   skewness kurtosis n.missing expression                    n_label       
#>      <dbl>    <dbl>     <int> <glue>                        <chr>         
#> 1   -0.580   -0.831         0 widehat(mu)[trimmed]=='21.71' "OJ\n(n = 30)"
#> 2    0.306   -0.700         0 widehat(mu)[trimmed]=='16.58' "VC\n(n = 30)"

# ----------------------- Bayesian -------------------------

set.seed(123)
centrality_description(sleep, group, extra, type = "b")
#> # A tibble: 2 × 14
#>   group  extra n_obs variable   iqr conf.low conf.high   min   max skewness
#>   <fct>  <dbl> <int> <chr>    <dbl>    <dbl>     <dbl> <dbl> <dbl>    <dbl>
#> 1 1     0.0579    10 extra     2.8   -1.29        3.55  -1.6   3.7    0.581
#> 2 2     0.973     10 extra     3.82   0.0527      4.79  -0.1   5.5    0.386
#>   kurtosis n.missing expression               n_label      
#>      <dbl>     <int> <glue>                   <chr>        
#> 1   -0.630         0 widehat(mu)[MAP]=='0.06' "1\n(n = 10)"
#> 2   -1.42          0 widehat(mu)[MAP]=='0.97' "2\n(n = 10)"

Suggestions

If you find any bugs or have any suggestions/remarks, please file an issue on GitHub: https://github.com/IndrajeetPatil/statsExpressions/issues