The function ggstatsplot::ggbetweenstats is designed to facilitate data exploration, and for making highly customizable publication-ready plots, with relevant statistical details included in the plot itself if desired. We will see examples of how to use this function in this vignette.

To begin with, here are some instances where you would want to use ggbetweenstats-

  • to check if a continuous variable differs across multiple groups/conditions

  • to compare distributions visually and check for outliers

Note: This vignette uses the pipe operator (%>%), if you are not familiar with this operator, here is a good explanation: http://r4ds.had.co.nz/pipes.html

Comparisons between groups with ggbetweenstats

To illustrate how this function can be used, we will use the gapminder dataset throughout this vignette. This dataset provides values for life expectancy, GDP per capita, and population, at 5 year intervals, from 1952 to 2007, for each of 142 countries (courtesy Gapminder Foundation). Let’s have a look at the data-

Note: for the remainder of the vignette we’re going to exclude Oceania from the analysis simply because there are so few observations (countries).

Suppose the first thing we want to inspect is the distribution of life expectancy for the countries of a continent in 2007. We also want to know if the mean differences in life expectancy between the continents is statistically significant.

The simplest form of the function call is-

Note: - The function automatically decides whether an independent samples t-test is preferred (for 2 groups) or a Oneway ANOVA (3 or more groups) based on the number of levels in the grouping variable. - The output of the function is a ggplot object which means that it can be further modified.

We can make the output much more aesthetically pleasing as well as informative by making use of the many optional parameters in ggbetweenstats. We’ll add a title and caption, better x and y axis labels, and tag and label the outliers in the data. We can and will change the overall theme as well as the color palette in use.

As can be appreciated from the effect size (partial eta squared) of 0.635, there are large differences in the mean life expectancy across continents. Importantly, this plot also helps us appreciate the distributions within any given continent. For example, although Asian countries are doing much better than African countries, on average, Afghanistan has a particularly grim average for the Asian continent, possibly reflecting the war and the political turmoil.

So far we have only used a classic parametric test and a boxviolin plot, but we can also use other available options:

  • The type (of test) argument also accepts the following abbreviations:
    "p" (for parametric), "np" (for nonparametric), "r" (for robust).

  • The type of plot to be displayed can also be modified ("box", "violin", or "boxviolin").

  • The color palettes can be modified.

Let’s use the combine_plots function to make one plot from three separate plots that demonstrates all of these options. Let’s compare life expectancy for all countries for the first and last year of available data 1957 and 2007. We will generate the plots one by one and then use combine_plots to merge them into one plot with some common labeling. It is possible, but not necessarily recommended, to make each plot have different colors or themes.

For example,

library(ggstatsplot)
library(gapminder)

# selecting subset of the data
df_year <- dplyr::filter(.data = gapminder::gapminder, 
                         year == 2007 | year == 1957)

# for reproducibility
set.seed(123)

# parametric ANOVA and box plot
p1 <- ggstatsplot::ggbetweenstats(
  data = df_year,
  x = year,
  y = lifeExp,
  plot.type = "box",
  type = "p",
  effsize.type = "d",
  title = "parametric test",
  package = "ggsci",
  palette = "nrc_npg",
  k = 2,
  messages = FALSE
)

# Kruskal-Wallis test (nonparametric ANOVA) and violin plot
p2 <- ggstatsplot::ggbetweenstats(
  data = df_year,
  x = year,
  y = lifeExp,
  xlab = "Year", 
  ylab = "Life expectancy", 
  plot.type = "violin",
  type = "np",
  title = "Non-parametric Test (violin plot)",
  package = "ggsci",
  palette = "uniform_startrek",
  k = 2,
  messages = FALSE
)

# robust ANOVA and boxviolin plot
p3 <- ggstatsplot::ggbetweenstats(
  data = df_year,
  x = year,
  y = lifeExp,
  xlab = "Year", 
  ylab = "Life expectancy", 
  plot.type = "boxviolin",
  type = "r",
  title = "Robust Test (box & violin plot)",
  tr = 0.005,
  package = "wesanderson",
  palette = "Royal2",
  nboot = 15,
  k = 2,
  messages = FALSE
)

# robust ANOVA and boxviolin plot
p4 <- ggstatsplot::ggbetweenstats(
  data = df_year,
  x = year,
  y = lifeExp,
  xlab = "Year", 
  ylab = "Life expectancy", 
  type = "bf",
  plot.type = "box",
  title = "Bayesian Test (box plot)",
  package = "ggsci",
  palette = "nrc_npg",
  k = 2,
  messages = FALSE
)

# combining the individual plots into a single plot
ggstatsplot::combine_plots(
  p1, p2, p3, p4, 
  nrow = 2, 
  ncol = 2, 
  labels = c("(a)", "(b)", "(c)", "(d)"),
  title.text = "Comparison of life expectancy between 1957 and 2007",
  caption.text = "Source: Gapminder Foundation",
  title.size = 14,
  caption.size = 12
)

Grouped analysis with grouped_ggbetweenstats

What if we want to analyze both by continent and between 1957 and 2007? A combination of our two previous efforts. In that case, we could write a for loop or use purrr, both of which are time consuming and can be a bit of a struggle.

ggstatsplot provides a special helper function for such instances: grouped_ggbetweenstats. This is merely a wrapper function around ggstatsplot::combine_plots. It applies ggbetweenstats across all levels of a specified grouping variable and then combines list of individual plots into a single plot. Note that the grouping variable can be anything: conditions in a given study, groups in a study sample, different studies, etc.

Let’s focus on the same 4 continents and for years: 1967, 1987, 2007.

As seen from the plot, although the life expectancy has been improving steadily across all continents as we go from 1967 to 2007, this improvement has not been happening at the same rate for all continents. Additionally, irrespective of which year we look at, we still find significant differences in life expectancy across continents which have been surprisingly consistent across five decades (based on the observed effect sizes).

Grouped analysis with ggbetweenstats + purrr

Although this grouping function provides a quick way to explore the data, it leaves much to be desired. For example, the same type of plot and test is applied for all years, but maybe we want to change this for different years, or maybe we want to gave different effect sizes for different years. This type of customization for different levels of a grouping variable is not possible with grouped_ggbetweenstats, but this can be easily achieved using the purrr package.

See the associated vignette here: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/purrr_examples.html

Within-subjects designs

A variant of this function which will be called ggwithinstats is currently under development. You can still use this function just to prepare the plot for exploratory data analysis, but the statistical details displayed in the subtitle will be incorrect. You can remove them by adding + ggplot2::labs(subtitle = NULL).

Suggestions

If you find any bugs or have any suggestions/remarks, please file an issue on GitHub: https://github.com/IndrajeetPatil/ggstatsplot/issues

Session Information

Summarizing session information for reproducibility.

options(width = 200)
devtools::session_info()
#> - Session info ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       America/New_York            
#>  date     2018-10-21                  
#> 
#> - Packages -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
#>  package      * version     date       lib source                                      
#>  abind          1.4-5       2016-07-21 [1] CRAN (R 3.5.0)                              
#>  assertthat     0.2.0       2017-04-11 [1] CRAN (R 3.5.0)                              
#>  backports      1.1.2       2017-12-13 [1] CRAN (R 3.5.0)                              
#>  base64enc      0.1-3       2015-07-28 [1] CRAN (R 3.5.0)                              
#>  BayesFactor    0.9.12-4.2  2018-05-19 [1] CRAN (R 3.5.0)                              
#>  bayesplot      1.6.0       2018-08-02 [1] CRAN (R 3.5.1)                              
#>  bindr          0.1.1       2018-03-13 [1] CRAN (R 3.5.0)                              
#>  bindrcpp     * 0.2.2       2018-03-29 [1] CRAN (R 3.5.0)                              
#>  boot           1.3-20      2017-08-06 [2] CRAN (R 3.5.1)                              
#>  broom          0.5.0.9001  2018-10-03 [1] Github (tidymodels/broom@140eb58)           
#>  broom.mixed    0.2.2.9000  2018-10-09 [1] Github (bbolker/broom.mixed@fb9163a)        
#>  callr          3.0.0       2018-08-24 [1] CRAN (R 3.5.1)                              
#>  car            3.0-2       2018-08-23 [1] CRAN (R 3.5.1)                              
#>  carData        3.0-2       2018-09-30 [1] CRAN (R 3.5.1)                              
#>  cellranger     1.1.0       2016-07-27 [1] CRAN (R 3.5.0)                              
#>  cli            1.0.1       2018-09-25 [1] CRAN (R 3.5.1)                              
#>  cluster        2.0.7-1     2018-04-13 [2] CRAN (R 3.5.1)                              
#>  coda           0.19-2      2018-10-08 [1] CRAN (R 3.5.1)                              
#>  codetools      0.2-15      2016-10-05 [2] CRAN (R 3.5.1)                              
#>  coin           1.2-2       2017-11-28 [1] CRAN (R 3.5.0)                              
#>  colorspace     1.3-2       2016-12-14 [1] CRAN (R 3.5.0)                              
#>  commonmark     1.6         2018-09-30 [1] CRAN (R 3.5.1)                              
#>  cowplot        0.9.99      2018-08-23 [1] Github (wilkelab/cowplot@374c3e9)           
#>  crayon         1.3.4       2018-09-26 [1] Github (r-lib/crayon@3e751fb)               
#>  curl           3.2         2018-03-28 [1] CRAN (R 3.5.0)                              
#>  data.table     1.11.8      2018-09-30 [1] CRAN (R 3.5.1)                              
#>  debugme        1.1.0       2017-10-22 [1] CRAN (R 3.5.0)                              
#>  DEoptimR       1.0-8       2016-11-19 [1] CRAN (R 3.5.0)                              
#>  desc           1.2.0       2018-05-01 [1] CRAN (R 3.5.0)                              
#>  devtools       2.0.0       2018-10-19 [1] CRAN (R 3.5.1)                              
#>  digest         0.6.18      2018-10-10 [1] CRAN (R 3.5.1)                              
#>  dplyr        * 0.7.7       2018-10-16 [1] CRAN (R 3.5.1)                              
#>  effsize        0.7.1       2017-03-21 [1] CRAN (R 3.5.0)                              
#>  emmeans        1.2.4       2018-09-22 [1] CRAN (R 3.5.1)                              
#>  estimability   1.3         2018-02-11 [1] CRAN (R 3.5.0)                              
#>  evaluate       0.12        2018-10-09 [1] CRAN (R 3.5.1)                              
#>  exact2x2       1.6.3       2018-07-27 [1] CRAN (R 3.5.1)                              
#>  exactci        1.3-3       2017-10-02 [1] CRAN (R 3.5.0)                              
#>  fit.models     0.5-14      2017-04-06 [1] CRAN (R 3.5.0)                              
#>  forcats        0.3.0       2018-02-19 [1] CRAN (R 3.5.0)                              
#>  foreign        0.8-70      2017-11-28 [2] CRAN (R 3.5.1)                              
#>  fs             1.2.6       2018-08-23 [1] CRAN (R 3.5.1)                              
#>  gapminder    * 0.3.0       2017-10-31 [1] CRAN (R 3.5.0)                              
#>  generics       0.0.1.9000  2018-10-03 [1] Github (r-lib/generics@24ba515)             
#>  ggcorrplot     0.1.2       2018-09-11 [1] CRAN (R 3.5.1)                              
#>  ggExtra        0.8         2018-08-14 [1] Github (daattali/ggExtra@76d1618)           
#>  ggplot2        3.0.0.9000  2018-09-26 [1] Github (tidyverse/ggplot2@e9f7ded)          
#>  ggrepel        0.8.0.9000  2018-09-09 [1] Github (slowkow/ggrepel@91877ca)            
#>  ggridges       0.5.1       2018-10-04 [1] Github (clauswilke/ggridges@1d5131f)        
#>  ggstatsplot  * 0.0.6.9000  2018-10-21 [1] local                                       
#>  ggthemes       4.0.1       2018-08-24 [1] CRAN (R 3.5.1)                              
#>  glmmTMB        0.2.2.0     2018-07-03 [1] CRAN (R 3.5.1)                              
#>  glue           1.3.0       2018-09-17 [1] Github (tidyverse/glue@4e74901)             
#>  groupedstats   0.0.3.9000  2018-10-12 [1] local                                       
#>  gtable         0.2.0       2016-02-26 [1] CRAN (R 3.5.0)                              
#>  gtools         3.8.1       2018-06-26 [1] CRAN (R 3.5.0)                              
#>  haven          1.1.2       2018-06-27 [1] CRAN (R 3.5.0)                              
#>  hms            0.4.2       2018-03-10 [1] CRAN (R 3.5.0)                              
#>  htmldeps       0.1.1       2018-09-17 [1] Github (rstudio/htmldeps@c1023e0)           
#>  htmltools      0.3.6       2017-04-28 [1] CRAN (R 3.5.0)                              
#>  httpuv         1.4.5       2018-07-19 [1] CRAN (R 3.5.1)                              
#>  jmv            0.9.4       2018-09-29 [1] Github (jamovi/jmv@7dac133)                 
#>  jmvcore        0.9.4       2018-09-17 [1] CRAN (R 3.5.1)                              
#>  knitr          1.20.19     2018-10-10 [1] Github (yihui/knitr@3abb642)                
#>  labeling       0.3         2014-08-23 [1] CRAN (R 3.5.0)                              
#>  later          0.7.5       2018-09-18 [1] CRAN (R 3.5.1)                              
#>  lattice        0.20-35     2017-03-25 [2] CRAN (R 3.5.1)                              
#>  lazyeval       0.2.1       2017-10-29 [1] CRAN (R 3.5.0)                              
#>  lme4           1.1-18-1    2018-08-17 [1] CRAN (R 3.5.1)                              
#>  magrittr       1.5         2014-11-22 [1] CRAN (R 3.5.0)                              
#>  MASS           7.3-50      2018-04-30 [2] CRAN (R 3.5.1)                              
#>  Matrix         1.2-14      2018-04-13 [2] CRAN (R 3.5.1)                              
#>  MatrixModels   0.4-1       2015-08-22 [1] CRAN (R 3.5.0)                              
#>  mc2d           0.1-18      2017-03-06 [1] CRAN (R 3.5.0)                              
#>  memoise        1.1.0       2017-04-21 [1] CRAN (R 3.5.0)                              
#>  mime           0.6         2018-10-05 [1] CRAN (R 3.5.1)                              
#>  miniUI         0.1.1.1     2018-05-18 [1] CRAN (R 3.5.0)                              
#>  minqa          1.2.4       2014-10-09 [1] CRAN (R 3.5.0)                              
#>  mnormt         1.5-5       2016-10-15 [1] CRAN (R 3.5.0)                              
#>  modelr         0.1.2       2018-05-11 [1] CRAN (R 3.5.0)                              
#>  modeltools     0.2-22      2018-07-16 [1] CRAN (R 3.5.1)                              
#>  multcomp       1.4-8       2017-11-08 [1] CRAN (R 3.5.0)                              
#>  munsell        0.5.0       2018-06-12 [1] CRAN (R 3.5.0)                              
#>  mvtnorm        1.0-8       2018-05-31 [1] CRAN (R 3.5.0)                              
#>  nlme           3.1-137     2018-04-07 [2] CRAN (R 3.5.1)                              
#>  nloptr         1.2.1       2018-10-03 [1] CRAN (R 3.5.1)                              
#>  openxlsx       4.1.0       2018-05-26 [1] CRAN (R 3.5.0)                              
#>  paletteer      0.1.0       2018-07-10 [1] CRAN (R 3.5.1)                              
#>  pbapply        1.3-4       2018-01-10 [1] CRAN (R 3.5.0)                              
#>  pcaPP          1.9-73      2018-01-14 [1] CRAN (R 3.5.0)                              
#>  pillar         1.3.0.9000  2018-10-01 [1] Github (r-lib/pillar@7be5b8a)               
#>  pkgbuild       1.0.2       2018-10-16 [1] CRAN (R 3.5.1)                              
#>  pkgconfig      2.0.2       2018-08-16 [1] CRAN (R 3.5.1)                              
#>  pkgdown        1.1.0.9000  2018-10-21 [1] Github (metrumresearchgroup/pkgdown@3836984)
#>  pkgload        1.0.1       2018-10-11 [1] CRAN (R 3.5.1)                              
#>  plyr           1.8.4       2016-06-08 [1] CRAN (R 3.5.0)                              
#>  prediction     0.3.6       2018-05-22 [1] CRAN (R 3.5.0)                              
#>  prettyunits    1.0.2       2015-07-13 [1] CRAN (R 3.5.0)                              
#>  processx       3.2.0       2018-08-16 [1] CRAN (R 3.5.1)                              
#>  promises       1.0.1       2018-04-13 [1] CRAN (R 3.5.0)                              
#>  ps             1.2.0       2018-10-16 [1] CRAN (R 3.5.1)                              
#>  psych          1.8.9       2018-10-08 [1] local                                       
#>  purrr          0.2.5       2018-05-29 [1] CRAN (R 3.5.0)                              
#>  purrrlyr       0.0.3       2018-05-29 [1] CRAN (R 3.5.0)                              
#>  pwr            1.2-2       2018-03-03 [1] CRAN (R 3.5.0)                              
#>  R6             2.3.0       2018-10-04 [1] CRAN (R 3.5.1)                              
#>  Rcpp           0.12.19     2018-10-01 [1] CRAN (R 3.5.1)                              
#>  readxl         1.1.0       2018-04-20 [1] CRAN (R 3.5.0)                              
#>  remotes        2.0.1       2018-10-19 [1] CRAN (R 3.5.1)                              
#>  reshape        0.8.7       2017-08-06 [1] CRAN (R 3.5.0)                              
#>  reshape2       1.4.3       2017-12-11 [1] CRAN (R 3.5.0)                              
#>  rio            0.5.10      2018-03-29 [1] CRAN (R 3.5.0)                              
#>  rjson          0.2.20      2018-06-08 [1] CRAN (R 3.5.0)                              
#>  rlang          0.2.99.0000 2018-10-10 [1] Github (r-lib/rlang@fb09ff3)                
#>  rmarkdown      1.10.14     2018-10-10 [1] Github (rstudio/rmarkdown@dcc7d37)          
#>  robust         0.4-18      2017-04-27 [1] CRAN (R 3.5.0)                              
#>  robustbase     0.93-3      2018-09-21 [1] CRAN (R 3.5.1)                              
#>  roxygen2       6.1.0.9000  2018-09-13 [1] Github (klutometis/roxygen@cc34200)         
#>  rprojroot      1.3-2       2018-01-03 [1] CRAN (R 3.5.0)                              
#>  rrcov          1.4-4       2018-05-24 [1] CRAN (R 3.5.0)                              
#>  rstudioapi     0.8         2018-10-02 [1] CRAN (R 3.5.1)                              
#>  sandwich       2.5-0       2018-08-17 [1] CRAN (R 3.5.1)                              
#>  scales         1.0.0       2018-08-09 [1] CRAN (R 3.5.1)                              
#>  sessioninfo    1.1.0       2018-09-25 [1] CRAN (R 3.5.1)                              
#>  shiny          1.1.0       2018-05-17 [1] CRAN (R 3.5.0)                              
#>  sjlabelled     1.0.14      2018-09-12 [1] CRAN (R 3.5.1)                              
#>  sjmisc         2.7.5       2018-09-13 [1] CRAN (R 3.5.1)                              
#>  sjstats        0.17.1      2018-10-04 [1] Github (strengejacke/sjstats@9eafc5e)       
#>  skimr          1.0.3       2018-07-06 [1] Github (ropenscilabs/skimr@c67559a)         
#>  snakecase      0.9.2       2018-08-14 [1] CRAN (R 3.5.1)                              
#>  ssanv          1.1         2015-06-23 [1] CRAN (R 3.5.0)                              
#>  stringdist     0.9.5.1     2018-06-08 [1] CRAN (R 3.5.0)                              
#>  stringi        1.2.4       2018-07-20 [1] CRAN (R 3.5.1)                              
#>  stringr        1.3.1       2018-05-10 [1] CRAN (R 3.5.0)                              
#>  survival       2.42-3      2018-04-16 [2] CRAN (R 3.5.1)                              
#>  testthat       2.0.1       2018-10-13 [1] CRAN (R 3.5.1)                              
#>  TH.data        1.0-9       2018-07-10 [1] CRAN (R 3.5.1)                              
#>  tibble         1.4.2       2018-01-22 [1] CRAN (R 3.5.1)                              
#>  tidyr          0.8.1.9000  2018-10-06 [1] Github (tidyverse/tidyr@3b62137)            
#>  tidyselect     0.2.5       2018-10-11 [1] CRAN (R 3.5.1)                              
#>  TMB            1.7.14      2018-06-23 [1] CRAN (R 3.5.0)                              
#>  usethis        1.4.0.9000  2018-09-23 [1] Github (r-lib/usethis@1e3c6a6)              
#>  withr          2.1.2       2018-03-15 [1] CRAN (R 3.5.0)                              
#>  WRS2           0.10-0      2018-06-15 [1] CRAN (R 3.5.0)                              
#>  xfun           0.3         2018-07-06 [1] CRAN (R 3.5.1)                              
#>  xml2           1.2.0       2018-01-24 [1] CRAN (R 3.5.0)                              
#>  xtable         1.8-3       2018-08-29 [1] CRAN (R 3.5.1)                              
#>  yaml           2.2.0       2018-07-25 [1] CRAN (R 3.5.1)                              
#>  zip            1.0.0       2017-04-25 [1] CRAN (R 3.5.0)                              
#>  zoo            1.8-4       2018-09-19 [1] CRAN (R 3.5.1)                              
#> 
#> [1] C:/Users/inp099/Documents/R/win-library/3.5
#> [2] C:/Program Files/R/R-3.5.1/library