You can cite this package/vignette as:


Patil, I. (2018). Visualizations with statistical details: The
'ggstatsplot' approach. PsyArxiv. doi:10.31234/osf.io/p7mku

A BibTeX entry for LaTeX users is

@Article{,
title = {Visualizations with statistical details: The 'ggstatsplot' approach},
author = {Indrajeet Patil},
year = {2021},
journal = {PsyArxiv},
url = {https://psyarxiv.com/p7mku/},
doi = {10.31234/osf.io/p7mku},
}

Lifecycle:

# Combining plots with combine_plots

ggstatsplot contains a “helper” function named combine_plots to help you combine several plots into one plot or add a combination of title, caption, and annotation texts with suitable default parameters. It is a wrapper around patchwork::wrap_plots(). Some examples below.

Note Before: If you have just one grouping variable and you’d like a plot for each factor of this variable the grouped_ variants (https://indrajeetpatil.github.io/ggstatsplot/reference/index.html) of all ggstatsplot functions will allow you do to this. They specifically use the combine_plots function under the covers.

# Example-1 using dplyr::group_map

The easiest way to run the same ggstatsplot operation across multiple grouping variables is by using dplyr::group_map functions and then - of course - one would like to combine these plots in a single plot.

# libraries
set.seed(123)
library(tidyverse)

# creating a list of plots
p_list <-
mtcars %>%
dplyr::filter(cyl != 4) %>%
dplyr::group_by(cyl) %>%
dplyr::group_map(.f = ~ ggstatsplot::ggbetweenstats(data = ., x = am, y = wt))

# combining plots
ggstatsplot::combine_plots(
plotlist = p_list,
annotation.args = list(tag_levels = list(as.vector(rlang::set_names(levels(as.factor(mtcars$cyl)))))) ) # Example-2 using purrr The full power of ggstatsplot can be leveraged with a functional programming package like purrr which can replace many for loops, is more succinct, and easier to read. Consider purrr as your first choice for combining multiple plots. An example using the iris dataset is provided below. Imagine that we want to separately plot the linear relationship between sepal length and sepal width for each of the three species but combine them into one consistent plot with common labeling and as one plot. Rather than call ggscatterstats three times and gluing the results or using patchwork directly, we’ll create a tibble called plots using purrr::map then feed that to combine_plots to get our combined plot. # for reproducibility set.seed(123) library(ggstatsplot) # creating a list column with ggstatsplot plots plots <- iris %>% dplyr::mutate(Species2 = Species) %>% # just creates a copy of this variable dplyr::group_by(Species) %>% tidyr::nest(.) %>% # a nested dataframe with list column called data dplyr::mutate( .data = ., plot = data %>% purrr::map( .x = ., .f = ~ ggstatsplot::ggscatterstats( data = ., x = Sepal.Length, y = Sepal.Width, xfill = "#0072B2", yfill = "#009E73", ggtheme = ggthemes::theme_fivethirtyeight(), ggstatsplot.layer = FALSE, messages = FALSE, # turns off warnings and notes messages marginal.type = "boxplot", title = glue::glue("Species: {.$Species2} (n = {length(.$Sepal.Length)})") ) ) ) # display the new object (notice that the class of the plot list column is # S3: ggExtraPlot) plots #> # A tibble: 3 x 3 #> # Groups: Species [3] #> Species data plot #> <fct> <list> <list> #> 1 setosa <tibble[,5] [50 x 5]> <ggExtrPl[,10]> #> 2 versicolor <tibble[,5] [50 x 5]> <ggExtrPl[,10]> #> 3 virginica <tibble[,5] [50 x 5]> <ggExtrPl[,10]> # creating a grid with patchwork ggstatsplot::combine_plots( plotlist = plots$plot,
plotgrid.args = list(nrow = 3, ncol = 1),
annotation.args = list(
title = "Relationship between sepal length and width for each Iris species",
caption = expression(
paste(
italic("Note"),
": Iris flower dataset was collected by Edgar Anderson.",
sep = ""
)
)
)
)

# Example-3 with plyr

Another popular package for handling big datasets is plyr, which allows us to repeatedly apply a common function on smaller pieces and then combine the results into a larger whole.

In this example we’ll start with the gapminder dataset. We’re interested in the linear relationship between Gross Domestic Product (per capita) and life expectancy in the year 2007, for all the continents except Oceania. We’ll use dplyr to filter to the right rows then use plyr to repeat the ggscatterstats function across each of the 4 continents remaining. The result is of that is a list of plots called plots. We then feed plots to the combine_plots function to merge them into one plot. We will call attention to the countries which have very low life expectancy (< 45 years) by labeling those countries when they occur.

library(plyr)
library(gapminder)

# for reproducibility
set.seed(123)

# let's have a look at the structure of the data
dplyr::glimpse(gapminder::gapminder)
#> Rows: 1,704
#> Columns: 6
#> $country <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", ~ #>$ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, ~
#> $year <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, ~ #>$ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8~
#> $pop <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12~ #>$ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, ~

# creating a list of plots
plots <- plyr::dlply(
.data = dplyr::filter(
gapminder::gapminder,
year == 2007, continent != "Oceania"
),
.variables = .(continent),
.fun = function(data) {
ggstatsplot::ggscatterstats(
data = data,
x = gdpPercap,
y = lifeExp,
xfill = "#0072B2",
yfill = "#009E73",
label.var = "country",
label.expression = "lifeExp < 45",
title = glue::glue("Continent: {data\$continent}"),
marginal = FALSE
) +
ggplot2::scale_x_continuous(labels = scales::comma)
}
)

# combining individual plots
ggstatsplot::combine_plots(
plotlist = plots,
annotation.args = list(title = "Relationship between GDP (per capita) and life expectancy"),
plotgrid.args = list(nrow = 2)
)

The combine_plots function can also be useful for adding additional textual information that can not be added by making a single call to a ggstatsplot function via the title, subtitle, or caption options. For this example let’s assume we want to assess the relationship between a movie’s rating and its budget from the Internet Movie Database using polynomial regression.

ggcoefstats will do most of the work, including the title, subtitle, and caption. But we want to add at the bottom an annotation to show the formula we are using for our regression. combine_plots allows us to add sub.text = to accomplish that task as shown in the resulting plot.

library(ggstatsplot)

ggstatsplot::combine_plots(
plotlist = list(ggstatsplot::ggcoefstats(
x = stats::lm(
formula = rating ~ stats::poly(budget, degree = 3),
data = dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.2),
na.action = na.omit
),
exclude.intercept = FALSE,
title = "Relationship between movie budget and IMDB rating",
subtitle = "Source: Internet Movie Database",
ggtheme = ggplot2::theme_gray(),
stats.label.color = c("#CC79A7", "darkgreen", "#0072B2", "red")
) +
# modifying the plot outside of ggstatsplot using ggplot2 functions
ggplot2::scale_y_discrete(
labels = c(
"Intercept (c)",
"1st degree (b1)",
"2nd degree (b2)",
"3rd degree (b3)"
)
) +
ggplot2::labs(y = "term (polynomial regression)")),
annotation.args = list(
caption = expression(
paste(
"linear model: ", bolditalic(y),
" ~ ",
italic(c) + italic(b)[1] * bolditalic(x) + italic(b)[2] * bolditalic(x)^
2 + italic(b)[3] * bolditalic(x)^3,
sep = ""
)
)
)
)

# Suggestions

If you find any bugs or have any suggestions/remarks, please file an issue on GitHub: https://github.com/IndrajeetPatil/ggstatsplot/issues