You can cite this package/vignette as:


  Patil, I. (2018). Visualizations with statistical details: The
  'ggstatsplot' approach. PsyArxiv. doi:10.31234/osf.io/p7mku

A BibTeX entry for LaTeX users is

  @Article{,
    title = {Visualizations with statistical details: The 'ggstatsplot' approach},
    author = {Indrajeet Patil},
    year = {2021},
    journal = {PsyArxiv},
    url = {https://psyarxiv.com/p7mku/},
    doi = {10.31234/osf.io/p7mku},
  }

Following are a few of the common questions asked in GitHub issues and on social media platforms.

1. I just want the plot, not the statistical details. How can I turn them off?

All functions in ggstatsplot that display results from statistical analysis in a subtitle have argument results.subtitle. Setting it to FALSE will return only the plot.

2. How can I customize the details contained in the subtitle?

Sometimes you may not wish include so many details in the subtitle. In that case, you can extract the expression and copy-paste only the part you wish to include. For example, here only statistic and p-values are included:

# setup
set.seed(123)
library(ggstatsplot)
library(ggplot2)
library(statsExpressions)

# extracting detailed expression
(res_expr <- oneway_anova(iris, Species, Sepal.Length, var.equal = TRUE))
#> # A tibble: 1 x 11
#>   statistic    df df.error  p.value method                    estimate
#>       <dbl> <dbl>    <dbl>    <dbl> <chr>                        <dbl>
#> 1      119.     2      147 1.67e-31 One-way analysis of means    0.612
#>   conf.level conf.low conf.high effectsize expression
#>        <dbl>    <dbl>     <dbl> <chr>      <list>    
#> 1       0.95    0.517     0.683 Omega2     <language>

# adapting the details to your liking
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
  geom_boxplot() +
  labs(subtitle = ggplot2::expr(paste(
    NULL, italic("F"), "(", "2",
    ",", "147", ") = ", "119.26", ", ",
    italic("p"), " = ", "1.67e-31"
  )))

3. I am getting Error in grid.Call error

Sometimes, if you are working in RStudio, you might see the following error-

Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
polygon edge not found

There is no unique solution for this. Just Google this error and see if any of those solutions solve your problem.

If not, pray to the old and the new gods, and try again. It might just solve your problem. 🤷

4. Why do I get only plot in return but not the subtitle/caption?

In order to prevent the function from failing when statistical analysis fails, functions in ggstatsplot default to first attempting to run the analysis and if they fail, then return empty (NULL) subtitle/caption. In such cases, if you wish to diagnose why the analysis is failing, you will have to do so using the underlying function used to carry out statistical analysis.

For example, the following returns only the plot but not the statistical details in a subtitle.

set.seed(123)
df <- data.frame(x = c("a", "b"), y = c(1, 2))

ggbetweenstats(data = df, x = x, y = y)

To see why the statistical analysis failed, you can look at the error from the underlying function:

library(statsExpressions)

df <- data.frame(x = c("a", "b"), y = c(1, 2))
two_sample_test(data = df, x = x, y = y)
#> Error in t.test.default(x = 1, y = 2, paired = FALSE, var.equal = FALSE, : not enough 'x' observations

5. What statistical test was carried out?

In case you are not sure what was the statistical test that produced the results shown in the subtitle of the plot, the best way to get that information is to either look at the documentation for the function used or check out the associated vignette.

Summary of all analysis is handily available in README: https://github.com/IndrajeetPatil/ggstatsplot/blob/master/README.md

6. How can I use ggstatsplot functions in a for loop?

Given that all functions in ggstatsplot use tidy evaluation, running these functions in a for loop requires minor adjustment to how inputs are entered:

# setup
data(mtcars)
library(ggstatsplot)
col.name <- colnames(mtcars)

# executing the function in a `for` loop
for (i in 3:length(col.name)) {
  ggbetweenstats(
    data = mtcars,
    x = cyl,
    y = !!col.name[i]
  )
}

That said, if repeating function execution across multiple columns in a dataframe in what you want to do, I will recommend you to have a look at purrr-based solution:

https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/purrr_examples.html#repeating-function-execution-across-multiple-columns-in-a-dataframe-1

7. How can I have uniform Y-axes ranges in grouped_ functions?

Across different facets of a grouped_ plot, the axes ranges might sometimes differ. You can use the ggplot.component (present in all functions) to have the same scale across the individual plots:

# setup
set.seed(123)
library(ggstatsplot)

# provide a list of further `ggplot2` modifications using `ggplot.component`
grouped_ggscatterstats(
  mtcars,
  disp,
  hp,
  grouping.var = am,
  results.subtitle = FALSE,
  ggplot.component = list(ggplot2::scale_y_continuous(
      breaks = seq(50, 350, 50),
      limits = (c(50, 350))
    ))
)

8. Does ggstatsplot work with plotly?

The plotly R graphing library makes it easy to produce interactive web graphics via plotly.js.

The ggstatsplot functions are compatible with plotly.

# for reproducibility
set.seed(123)
library(ggstatsplot)
library(plotly)

# creating ggplot object with `ggstatsplot`
p <-
  ggstatsplot::ggbetweenstats(
    data = mtcars,
    x = cyl,
    y = mpg
  )

# converting to plotly object
plotly::ggplotly(p, width = 480, height = 480)

9. How can I use grouped_ functions with more than one group?

Currently, the grouped_ variants of functions only support repeating the analysis across a single grouping variable. Often, you have to run the same analysis across a combination of more than two grouping variables. This can be easily achieved using purrr package.

Here is an example-

# setup
set.seed(123)
library(ggstatsplot)

# creating a list by splitting dataframe by combination of two different
# grouping variables
df_list <-
  mpg %>%
  dplyr::filter(drv %in% c("4", "f"), fl %in% c("p", "r")) %>%
  split(x = ., f = list(.$drv, .$fl), drop = TRUE)

# checking if the length of the list is 4
length(df_list)
#> [1] 4

# running correlation analyses between
# this will return a *list* of plots
plot_list <-
  purrr::pmap(
    .l = list(
      data = df_list,
      x = "displ",
      y = "hwy",
      results.subtitle = FALSE,
      marginal.type = "densigram"
    ),
    .f = ggstatsplot::ggscatterstats
  )

# arragen the list in a single plot
ggstatsplot::combine_plots(
  plotlist = plot_list,
  plotgrid.args = list(nrow = 2),
  annotation.args = list(tag_levels = "i")
)

10. How can I include statistical expressions in facet labels?

set.seed(123)
library(ggplot2)
library(ggstatsplot)

# data
mtcars1 <- mtcars
statistics <-
  grouped_ggbetweenstats(
    data = mtcars1,
    x = cyl,
    y = mpg,
    grouping.var = am,
    output = "subtitle"
  )
mtcars1$am <- factor(mtcars1$am, levels = c(0, 1), labels = statistics)

# plot
mtcars1 %>%
  ggplot(aes(x = cyl, y = mpg)) +
  geom_jitter() +
  facet_wrap(
    vars(am),
    ncol = 1,
    strip.position = "top",
    labeller = ggplot2::label_parsed
  )

11. Can you customize which pairs are shown in pairwise comparisons?

Currently, for ggbetweenstats and ggwithinstats, you can either display all significant comparisons, all non-significant comparisons, or all comparisons. But what if I am only interested in just one particular comparison?

Here is a workaround using ggsignif:

set.seed(123)
library(ggstatsplot)
library(ggsignif)

# displaying only one comparison
ggbetweenstats(mtcars, cyl, wt, pairwise.comparisons = FALSE) +
  geom_signif(comparisons = list(c("4", "6")))

12. How to access dataframe with results from pairwise comparisons?

library(ggstatsplot)
library(ggplot2)

# way-1

p <- ggbetweenstats(mtcars, cyl, wt)

pb <- ggplot_build(p)

pb$plot$plot_env$df_pairwise
#> # A tibble: 3 x 11
#>   group1 group2 statistic   p.value alternative method            distribution
#>   <chr>  <chr>      <dbl>     <dbl> <chr>       <chr>             <chr>       
#> 1 4      6           5.39 0.00831   two.sided   Games-Howell test q           
#> 2 4      8           9.11 0.0000124 two.sided   Games-Howell test q           
#> 3 6      8           5.12 0.00831   two.sided   Games-Howell test q           
#>   p.adjustment test.details      p.value.adjustment
#>   <chr>        <chr>             <chr>             
#> 1 none         Games-Howell test Holm              
#> 2 none         Games-Howell test Holm              
#> 3 none         Games-Howell test Holm              
#>   label                                     
#>   <chr>                                     
#> 1 list(~italic(p)[Holm-corrected]==0.008)   
#> 2 list(~italic(p)[Holm-corrected]==1.24e-05)
#> 3 list(~italic(p)[Holm-corrected]==0.008)

# way-2

library(pairwiseComparisons)

pairwise_comparisons(mtcars, cyl, wt)
#> # A tibble: 3 x 11
#>   group1 group2 statistic   p.value alternative method            distribution
#>   <chr>  <chr>      <dbl>     <dbl> <chr>       <chr>             <chr>       
#> 1 4      6           5.39 0.00831   two.sided   Games-Howell test q           
#> 2 4      8           9.11 0.0000124 two.sided   Games-Howell test q           
#> 3 6      8           5.12 0.00831   two.sided   Games-Howell test q           
#>   p.adjustment test.details      p.value.adjustment
#>   <chr>        <chr>             <chr>             
#> 1 none         Games-Howell test Holm              
#> 2 none         Games-Howell test Holm              
#> 3 none         Games-Howell test Holm              
#>   label                                     
#>   <chr>                                     
#> 1 list(~italic(p)[Holm-corrected]==0.008)   
#> 2 list(~italic(p)[Holm-corrected]==1.24e-05)
#> 3 list(~italic(p)[Holm-corrected]==0.008)

13. How can I change annotation in pairwise comparisons?

ggstatsplot defaults to displaying exact p-values or logged Bayes Factor values for pairwise comparisons. But what if you wish to adopt a different annotation labels?

You will have to customize them using pairwiseComparisons and ggsignif:

# needed libraries
set.seed(123)
library(ggplot2)
library(pairwiseComparisons)
library(ggsignif)
library(ggstatsplot)

# converting to factor
mtcars$cyl <- as.factor(mtcars$cyl)

# creating a basic plot
p <- ggbetweenstats(mtcars, cyl, wt, pairwise.comparisons = FALSE)

# using `pairwiseComparisons` package to create a dataframe with results
set.seed(123)
(df <-
  pairwise_comparisons(mtcars, cyl, wt) %>%
  dplyr::mutate(groups = purrr::pmap(.l = list(group1, group2), .f = c)) %>%
  dplyr::arrange(group1) %>%
  dplyr::mutate(asterisk_label = c("**", "***", "**")))
#> # A tibble: 3 x 13
#>   group1 group2 statistic   p.value alternative method            distribution
#>   <chr>  <chr>      <dbl>     <dbl> <chr>       <chr>             <chr>       
#> 1 4      6           5.39 0.00831   two.sided   Games-Howell test q           
#> 2 4      8           9.11 0.0000124 two.sided   Games-Howell test q           
#> 3 6      8           5.12 0.00831   two.sided   Games-Howell test q           
#>   p.adjustment test.details      p.value.adjustment
#>   <chr>        <chr>             <chr>             
#> 1 none         Games-Howell test Holm              
#> 2 none         Games-Howell test Holm              
#> 3 none         Games-Howell test Holm              
#>   label                                      groups    asterisk_label
#>   <chr>                                      <list>    <chr>         
#> 1 list(~italic(p)[Holm-corrected]==0.008)    <chr [2]> **            
#> 2 list(~italic(p)[Holm-corrected]==1.24e-05) <chr [2]> ***           
#> 3 list(~italic(p)[Holm-corrected]==0.008)    <chr [2]> **

# adding pairwise comparisons using `ggsignif`
p +
  ggsignif::geom_signif(
    comparisons = df$groups,
    map_signif_level = TRUE,
    annotations = df$asterisk_label,
    y_position = c(5.5, 5.75, 6.0),
    test = NULL,
    na.rm = TRUE
  )

14. How to access dataframe with results from ggpiestats and ggbarstats?

# setup
set.seed(123)
library(ggplot2)

# plot
p <- ggpiestats(mtcars, am, cyl)

# build ggplot object
pb <- ggplot2::ggplot_build(p)

# dataframe with proportion test results
pb$plot$plot_env$df_proptest
#> # A tibble: 3 x 10
#>   cyl   counts  perc N        statistic    df p.value
#>   <fct>  <int> <dbl> <chr>        <dbl> <dbl>   <dbl>
#> 1 8         14  43.8 (n = 14)     7.14      1 0.00753
#> 2 6          7  21.9 (n = 7)      0.143     1 0.705  
#> 3 4         11  34.4 (n = 11)     2.27      1 0.132  
#>   method                                  
#>   <chr>                                   
#> 1 Chi-squared test for given probabilities
#> 2 Chi-squared test for given probabilities
#> 3 Chi-squared test for given probabilities
#>   .label                                                            
#>   <chr>                                                             
#> 1 list(~chi['gof']^2~(1)==7.14, ~italic(p)=='0.008', ~italic(n)==14)
#> 2 list(~chi['gof']^2~(1)==0.14, ~italic(p)=='0.705', ~italic(n)==7) 
#> 3 list(~chi['gof']^2~(1)==2.27, ~italic(p)=='0.132', ~italic(n)==11)
#>   .p.label                 
#>   <chr>                    
#> 1 list(~italic(p)=='0.008')
#> 2 list(~italic(p)=='0.705')
#> 3 list(~italic(p)=='0.132')

# dataframe with counts and proportions details
pb$plot$plot_env$df_descriptive
#> # A tibble: 6 x 5
#>   cyl   am    counts  perc .label
#>   <fct> <fct>  <int> <dbl> <chr> 
#> 1 4     1          8  72.7 73%   
#> 2 6     1          3  42.9 43%   
#> 3 8     1          2  14.3 14%   
#> 4 4     0          3  27.3 27%   
#> 5 6     0          4  57.1 57%   
#> 6 8     0         12  85.7 86%

15. How can I remove a particular geom layer from the plot?

Sometimes you may not want a particular geom layer to be displayed. You can remove them using gginnards.

For example, let’s say we want to remove the geom_point() from ggwithinstats default plot.

# needed libraries
library(ggstatsplot)
library(gginnards)

# plot with all geoms
p <-
  ggwithinstats(
    data = bugs_long,
    x = condition,
    y = desire,
    results.subtitle = FALSE,
    pairwise.comparisons = FALSE
  )

# delete `geom` corresponding to violin
gginnards::delete_layers(x = p, match_type = "GeomViolin")

This can be helpful to add a new layer with aesthetic specifications of your liking.

# needed libraries
set.seed(123)
library(ggstatsplot)
library(gginnards)
library(ggplot2)

# basic plot without mean tagging
p <-
  ggbetweenstats(
    data = mtcars,
    x = am,
    y = wt,
    centrality.plotting = FALSE
  )

# delete the geom_point layer
p <- gginnards::delete_layers(x = p, match_type = "GeomPoint")

# add a new layers for points with a different shape
p + geom_point(shape = 23, aes(color = am))

16. How can I modify the fill colors with custom values?

Sometimes you may not be satisfied with the available color palette values. In this case, you can also change the colors by manually specifying these values.

# needed libraries
set.seed(123)
library(ggstatsplot)
library(ggplot2)

ggbarstats(mtcars, am, cyl, results.subtitle = FALSE) +
  scale_fill_manual(values = c("#E7298A", "#66A61E"))

The same can also be done for grouped_ functions:

ggstatsplot::grouped_ggpiestats(
  data = mtcars,
  grouping.var = am,
  x = cyl,
  ggplot.component = ggplot2::scale_fill_grey()
)

17. How can I modify grouped_ outputs using ggplot2 functions?

All ggstatsplot are ggplot objects, which can be further modified, just like any other ggplot object. But exception to these are all plots returned by grouped_ functions, but there is a way to tackle this.

# needed libraries
set.seed(123)
library(ggstatsplot)
library(paletteer)
library(ggplot2)
library(palmerpenguins)

# plot
grouped_ggbetweenstats(
  penguins,
  species,
  body_mass_g,
  grouping.var = sex,
  type = "np",
  ggplot.component =
    # modify further with `ggplot2` functions
  list(
    scale_color_manual(values = paletteer::paletteer_c("viridis::viridis", 3)),
    theme(axis.text.x = element_text(angle = 90))
  )
)

18. How can I extract dataframe containing results from ggstatsplot?

ggstatsplot can return expressions in the subtitle and caption, but what if you want to actually get back dataframe containing the results?

This is possible via statsExpressions: https://indrajeetpatil.github.io/statsExpressions/articles/dataframe_outputs.html

19. How can I remove sample size labels for ggbarstats?

library(ggstatsplot)
library(gginnards)

# create a plot
p <- ggbarstats(mtcars, am, cyl)

# remove layer corresponding to sample size
delete_layers(p, "GeomText")

20. Test I need is not available. What can I do?

By default, since ggstatsplot always allows just one type of test per statistical approach, sometimes your favorite test might not be available. For example, ggstatsplot provides only Spearman’s \(\rho\), but not Kendall’s \(\tau\) as a non-parametric correlation test.

In such cases, you can override the defaults and use statsExpressions to create custom expressions to display in the plot. But be forewarned that the expression building functions in statsExpressions are not stable yet.

# setup
set.seed(123)
library(ggstatsplot)
library(correlation)
library(statsExpressions)
library(ggplot2)

# data with two variables of interest
df <- dplyr::select(mtcars, wt, mpg)

# correlation results
results <-
  correlation(df, method = "kendall") %>%
  parameters::standardize_names(style = "broom")

# creating expression out of these results
expr_results <-
  statsExpressions::expr_template(
    data = results,
    no.parameters = 0L,
    statistic.text = quote(italic("T")),
    effsize.text = quote(widehat(italic(tau))["Kendall"]),
    n = results$n.obs[[1]]
  )

# plot (overriding defaults and using custom expression)
ggscatterstats(
  df, wt, mpg,
  results.subtitle = FALSE,
  ggplot.component = list(labs(subtitle = expr_results))
)

Suggestions

If you find any bugs or have any suggestions/remarks, please file an issue on GitHub: https://github.com/IndrajeetPatil/ggstatsplot/issues