class: center, middle, inverse, title-slide #
ggstatsplot
:
ggplot2
Based Plots with Statistical Details ## An Introductory Tutorial (version: 0.6.0) ### Indrajeet Patil ### 2020-10-14
Follow along:
https://tinyurl.com/y376yz6o
--- <style type="text/css"> body, td { font-size: 16px; } code.r{ font-size: 14px; } </style> --- layout: true **Plan** --- - Why *ggstatsplot*? - Current functions - Benefits and scalability - Limitations --- layout: false class: inverse, center, middle # Why *ggstatsplot*? --- layout: true # Raison d'être --- -- .right-column[.font150[ Current count of packages on the Comprehensive R Archive Network (**CRAN**)<br> **<font color="red"> > 14,000</font>** ] .footnote[<https://cran.r-project.org/web/packages/>] ] -- .left-column[  ] -- .right-column[.font150[ Short answer: `ggstatsplot` provides a collection of <font color="blue">*information-rich*</font> plots with <font color="blue">*statistical details*</font> and is suitable for scholarly publications and quick (exploratory) statistical analysis. ] ] --- layout: true # 1. Simpler data analysis workflow --- -- .img-center[  ] .footnote[[(Grolemund & Wickham, *R for Data Science*, 2017)](https://r4ds.had.co.nz/)] -- <br> <br> <br> <br> <br> <br> <br> <br> In a typical *exploratory* data analysis workflow, <font color="blue">*data visualization*</font> and <font color="blue">*statistical modeling*</font> are two different phases: visualization informs modeling, and modeling in its turn can suggest a different visualization method, and so on and so forth. -- The central idea of **ggstatsplot** is simple: combine these two phases into one in the form of graphics with statistical details, which makes data exploration simpler and faster. --- layout: true # 2. Information-rich graphic is worth a thousand words --- -- .img-center[  ] .footnote[[(Matejka & Fitzmaurice, *Autodesk Research*, 2017)](https://www.autodeskresearch.com/publications/samestats)] -- <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> .font150[“I plotted my data and what I found will surprise me!" - BuzzFeed] --- layout: false # 3. Ready-made plot = no customization -- The **grammar of graphics** (implemented in `ggplot2`) is an incredibly powerful framework to prepare graphics and can help you make infinite number of graphics, each tailored for your specific data visualization problem! But... -- .pull-left[  ] .pull-right[  ] --- layout: false # 4. Consistent API = no cognitive fatigue -- ```r stats::lm(formula = wt ~ mpg, data = mtcars) ``` -- ```r stats::cor(x = mtcars$wt, y = mtcars$mpg) ``` -- ```r stats::cor.test(formula = ~ wt + mpg, data = mtcars) ``` -- .img-right-small[  ] -- <br> <br> **All** functions in `ggstatsplot`- 1. consistently rely on a dataframe (e.g., `data, x, y, ...`) 2. expect the data to be in tidy format (Wickham, [2014](https://vita.had.co.nz/papers/tidy-data.pdf)) 3. accept both `"quoted"` and `unquoted` arguments --- layout: false # 5. Follows best practices for statistical reporting -- For all statistical tests reported in the plots, the default template abides by the [APA](https://my.ilstu.edu/~jhkahn/apastats.html) gold standard for statistical reporting. For example, here are results from a robust *t*-test:  --- layout: false class: inverse, center, middle # Get Started --- layout: false # Installation -- Install the **ggstatsplot** package from [CRAN](https://cran.r-project.org/web/packages/ggstatsplot/index.html): ```r install.packages("ggstatsplot") ``` -- You can get the development version of the package from [Github](https://github.com/IndrajeetPatil/ggstatsplot): ```r library(remotes) remotes::install_github("IndrajeetPatil/ggstatsplot", dependencies = FALSE) ``` -- Load the needed packages- ```r library(ggstatsplot) library(ggplot2) ``` -- You are recommended to use the [RStudioIDE](https://www.rstudio.com/products/rstudio/), but you don't have to. --- layout: false class: inverse, center, middle # ggbetweenstats For between group/condition comparisons --- layout: true # ggbetweenstats - defaults --- .left-code[ ```r ggbetweenstats( data = movies_long, * x = mpaa, # > 2 groups y = rating, * type = "p", # default messages = FALSE ) ``` .font80[ Changing the type of test - `"p"` → **parametric** - `"np"` → **non-parametric** - `"r"` → **robust** - `"bf"` → **bayes factor** ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggbetweenstats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggbetweenstats - little code, rich details --- .left-code[ ```r ggbetweenstats( data = movies_long, x = mpaa, y = rating ) ``` .font80[ Default information: - <font color="orange">statistical details</font> - <font color="blue">Bayes Factor</font> - <font color="green">sample sizes</font> - <font color="red">distribution summary</font> ] ] .right-plot[  ] --- layout: true # ggbetweenstats - pairwise comparisons --- .left-code[ ```r ggbetweenstats( data = movies_long, x = mpaa, y = rating, type = "np", # << mean.ci = TRUE, # << pairwise.display = "ns", # << p.adjust.method = "fdr", # << messages = FALSE ) ``` .font80[ Changing pairwise comparisons displayed - `"ns"` → only **non-significant** - `"s"` → only **significant** - `"all"` → **everything** ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggbetweenstats_2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- exclude: false layout: true # ggbetweenstats - (aesthetic) changes & outlier tagging --- exclude: false .left-code[ ```r ggbetweenstats( data = movies_long, x = mpaa, y = rating, type = "r", # << outlier.tagging = TRUE, # << outlier.label = title, # << outlier.coef = 2, # << ggtheme = hrbrthemes::theme_ipsum_tw(), # << palette = "Darjeeling2", # << package = "wesanderson", # << messages = FALSE ) ``` .font80[ Aesthetic preferences are not an excuse to not use `ggstatsplot` 😻 The default palette used is **colorblind-friendly**. ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggbetweenstats_3-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggbetweenstats - modification with *ggplot2* --- .left-code[ ```r ggbetweenstats( data = movies_long, x = mpaa, y = rating, type = "bf", # << messages = FALSE ) + scale_y_continuous(sec.axis = dup_axis()) # << ``` .font80[ **Note**: You can modify all `ggstatsplot` plots further using `ggplot2` functions. Yaay! ] .img-left-small[  ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggbetweenstats_4-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggbetweenstats - 2 groups --- .left-code[ ```r ggbetweenstats( data = mtcars, * x = am, # 2 groups y = wt, * type = "p", # default messages = FALSE ) ``` .font80[ Changing the type of test - `"p"` → **parametric** - `"np"` → **non-parametric** - `"r"` → **robust** - `"bf"` → **bayes factor** ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggbetweenstats_5-1.png" width="100%" style="display: block; margin: auto;" /> ] --- exclude: false layout: true # Summary of tests - *ggbetweenstats* --- exclude: false .font100[ Type | No. of groups | Test ----------- | -- | -------------------------- **<font color="blue">Parametric<font>** | > 2 | Fisher's or Welch's one-way ANOVA **<font color="blue">Parametric<font>** | 2 | Student's or Welch's *t*-test **<font color="#ff6600">Non-parametric<font>** | > 2 | Kruskal–Wallis one-way ANOVA **<font color="#ff6600">Non-parametric<font>** | 2 | Mann–Whitney *U* test **<font color="#ff00ff">Robust<font>** | > 2 | Heteroscedastic one-way ANOVA for trimmed means **<font color="#ff00ff">Robust<font>** | 2 | Yuen's test for trimmed means **<font color="#009933">Bayes Factor<font>** | > 2 | Student's ANOVA **<font color="#009933">Bayes Factor<font>** | 2 | Student's *t*-test ] --- exclude: false layout: true # Effect sizes + CI - *ggbetweenstats* --- exclude: false Type | Levels | Test | CI? ------------ | -- | -------------------------- | ---- **<font color="blue">Parametric<font>** | > 2 | `\(\eta_{p}^2\)`, `\(\eta^2\)`, `\(\omega_{p}^2\)`, `\(\omega^2\)` | <font color="green">Yes<font> **<font color="blue">Parametric<font>** | 2 | Cohen's *d*, Hedge's *g* (central-and noncentral-*t* distribution based) | <font color="green">Yes<font> **<font color="#ff6600">Non-parametric<font>** | > 2 | `\(\epsilon^2\)` | <font color="green">Yes<font> **<font color="#ff6600">Non-parametric<font>** | 2 | *r* (computed as `\(Z/\sqrt{N_{obs}}\)`) | <font color="green">Yes<font> **<font color="#ff00ff">Robust<font>** | > 2 | `\(\xi\)` (Explanatory measure of effect size) | <font color="green">Yes<font> **<font color="#ff00ff">Robust<font>** | 2 | `\(\xi\)` (Explanatory measure of effect size) | <font color="green">Yes<font> **<font color="#009933">Bayes Factor<font>** | > 2 | <font color="red">No<font> | <font color="red">No<font> **<font color="#009933">Bayes Factor<font>** | 2 | Posterior estimate (difference) | <font color="green">Yes<font> --- exclude: false layout: true # Pairwise comparion tests - *ggbetweenstats* --- exclude: false Type | Equal variance? | Test | *p*-value adjustment? ----------- | --- | ------------------------- | --- **<font color="blue">Parametric<font>** | No | Games-Howell test | <font color="green">Yes</font> **<font color="blue">Parametric<font>** | Yes | Student's *t*-test | <font color="green">Yes</font> **<font color="#ff6600">Non-parametric<font>** | No | Dwass-Steel-Crichtlow-Fligner test | <font color="green">Yes</font> **<font color="#ff00ff">Robust<font>** | No | Yuen's trimmed means test | <font color="green">Yes</font> **<font color="#009933">Bayes Factor<font>** | `NA` | Student's *t*-test | <font color="green">Yes</font> --- layout: false class: inverse, center, middle # ggwithinstats For within group/condition comparisons<br> --- layout: true # ggwithinstats - repeated measures equivalent --- .left-code[ ```r ggwithinstats( data = WRS2::WineTasting, x = Wine, y = Taste, ggtheme = hrbrthemes::theme_ipsum_tw(), # << ggstatsplot.layer = FALSE, messages = FALSE ) ``` .font70[ Changing the type of test - `"p"` → **parametric** - `"np"` → **non-parametric** - `"r"` → **robust** - `"bf"` → **bayes factor** ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggwithinstats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggwithinstats - little code, rich details --- .left-code[ ```r ggwithinstats( data = WRS2::WineTasting, * x = Wine, # > 2 groups y = Taste, ggtheme = hrbrthemes::theme_ipsum_tw(), # << ggstatsplot.layer = FALSE, messages = FALSE ) ``` .font80[ Default information: - <font color="orange">statistical details</font> - <font color="blue">pairwise comparisons</font> - <font color="green">sample sizes</font> - <font color="red">distribution summary</font> ] ] .right-plot[  ] --- exclude: false layout: true # ggwithinstats - two groups --- exclude: false .left-code[ ```r ggwithinstats( data = iris_long, * x = attribute, # 2 groups y = value, type = "r", # << messages = FALSE ) ``` .font80[ Changing the type of test - `"p"` → **parametric** - `"np"` → **non-parametric** - `"r"` → **robust** - `"bf"` → **bayes factor** ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggwithinstats_2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- exclude: false layout: true # Summary of tests - *ggwithinstats* --- exclude: false Type | No. of groups | Test ----------- | --- | ------------------------- **<font color="blue">Parametric<font>** | > 2 | One-way repeated measures ANOVA **<font color="blue">Parametric<font>** | 2 | Student's *t*-test **<font color="#ff6600">Non-parametric<font>** | > 2 | Friedman rank sum test **<font color="#ff6600">Non-parametric<font>** | 2 | Mann–Whitney *U* test **<font color="#ff00ff">Robust<font>** | > 2 | Heteroscedastic one-way repeated measures ANOVA for trimmed means **<font color="#ff00ff">Robust<font>** | 2 | Yuen's test on trimmed means for dependent samples **<font color="#009933">Bayes Factor<font>** | > 2 | One-way repeated measures ANOVA **<font color="#009933">Bayes Factor<font>** | 2 | Student's *t*-test --- exclude: false layout: true # Effect sizes + CI - *ggwithinstats* --- exclude: false Type | No. of groups | Test | CI? ----------- | --- | ------------------------- | --- **<font color="blue">Parametric<font>** | > 2 | `\(\eta_{p}^2\)`, `\(\omega^2\)` | <font color="green">Yes<font> **<font color="blue">Parametric<font>** | 2 | Cohen's *d*, Hedge's *g* (central-and noncentral-*t* distribution based) | <font color="green">Yes<font> **<font color="#ff6600">Non-parametric<font>** | > 2 | `\(W_{Kendall}\)` (Kendall's coefficient of concordance) | <font color="green">Yes<font> **<font color="#ff6600">Non-parametric<font>** | 2 | *r* (computed as `\(Z/\sqrt{N_{pairs}}\)`) | <font color="green">Yes<font> **<font color="#ff00ff">Robust<font>** | > 2 | <font color="red">No<font> | <font color="green">Yes<font> **<font color="#ff00ff">Robust<font>** | 2 | `\(\xi\)` (Explanatory measure of effect size) | <font color="green">Yes<font> **<font color="#009933">Bayes Factor<font>** | > 2 | <font color="red">No<font> | <font color="red">No<font> **<font color="#009933">Bayes Factor<font>** | 2 | Posterior estimate (difference) | <font color="green">Yes<font> --- exclude: false layout: true # Pairwise comparion tests - *ggwithinstats* --- exclude: false Type | Test | *p*-value adjustment? ----------- | ---------------------------- | --- **<font color="blue">Parametric<font>** | Student's *t*-test | <font color="green">Yes</font> **<font color="#ff6600">Non-parametric<font>** | Durbin-Conover test | <font color="green">Yes</font> **<font color="#ff00ff">Robust<font>** | Yuen's trimmed means test | <font color="green">Yes</font> **<font color="#009933">Bayes Factor<font>** | Student's *t*-test | <font color="green">Yes</font> --- layout: false class: inverse, center, middle # ggscatterstats Association between two numeric variables --- layout: true # ggscatterstats - defaults --- .left-code[ ```r ggscatterstats( data = movies_long, x = budget, y = rating, type = "p", # default #<<< messages = FALSE ) ``` .font80[ Changing the type of test - `"p"` → **parametric** - `"np"` → **non-parametric** - `"r"` → **robust** - `"bf"` → **bayes factor** ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggscatterstats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggscatterstats - little code, rich details --- .left-code[ ```r ggscatterstats( data = movies_long, x = budget, y = rating ) ``` .font80[ Default information: - <font color="red">distribution</font> - <font color="blue">Bayes Factor</font> - <font color="orange">statistical details</font> ] ] .right-plot[  ] --- layout: true # ggscatterstats - conditional point tagging --- .left-code[ ```r ggscatterstats( data = movies_long, x = budget, y = rating, type = "r", label.var = title, # << label.expression = budget > 150 # << & rating > 7.5, # << marginal.type = "density", # << messages = FALSE ) ``` .font70[ Changing the marginal type - **histogram** - **boxplot** - **density** - **violin** - **densigram** ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggscatterstats_2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- exclude: false layout: true # Summary of tests - *ggscatterstats* --- exclude: false Type | Test | CI? ----------- | ------------------------- | --- **<font color="blue">Parametric<font>** | Pearson's correlation coefficient | <font color="green">Yes<font> **<font color="#ff6600">Non-parametric<font>** | Spearman's rank correlation coefficient | <font color="green">Yes<font> **<font color="#ff00ff">Robust<font>** | Percentage bend correlation coefficient | <font color="green">Yes<font> **<font color="#009933">Bayes Factor<font>** | Pearson's correlation coefficient | <font color="red">No<font> --- exclude: false layout: true # ggscatterstats - changing smoothing functions --- exclude: false .left-code[ ```r ggscatterstats( data = movies_long, x = budget, y = rating, marginal = FALSE, method = "gam", # << formula = y ~ s(x, k = 3), # << centrality.para = "mean", # << messages = FALSE ) ``` .font80[ Available centrality parameters - **mean** - **median** ] ] .right-plot[  ] --- layout: false class: inverse, center, middle # ggcorrmat Association between multiple numeric variables --- layout: true # ggcorrmat - defaults --- .left-code[ ```r ggcorrmat(dplyr::starwars) ``` .font80[ Changing the type of test - `"p"` → **parametric** - `"np"` → **non-parametric** - `"r"` → **robust** - `"bf"` → <font color="red">not implemented</font> ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggcorrmat_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggcorrmat - little code, rich details --- .left-code[ ```r ggcorrmat(dplyr::starwars) ``` .font80[ Default information: - <font color="red">statistical details</font> - <font color="blue">sample sizes</font> - <font color="green">details about test</font> <br> <br> **Note**: Informative label about sample sizes in case `NA`s are present. ] ] .right-plot[  ] --- layout: true # ggcorrmat - changing defaults --- .left-code[ ```r ggcorrmat( data = ggplot2::msleep, cor.vars = sleep_cycle:bodywt, type = "r", matrix.type = "upper", # << p.adjust.method = "holm" ) ``` .font80[ In addition to `output = "plot"`, this function can also be used to get a <font color="blue">dataframe</font> of results: - `"r"` → **correlation** - `"p"` → **p-values** - `"n"` → **sample sizes** - `"ci"` → **confidence intervals** ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggcorrmat_2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: false class: inverse, center, middle # gghistostats Distribution of a numeric variable --- layout: true # gghistostats - defaults --- .left-code[ ```r gghistostats( data = movies_long, x = budget, test.value = 50, # << messages = FALSE ) ``` .font80[ Changing the type of test - `"p"` → **parametric** - `"np"` → **non-parametric** - `"r"` → **robust** - `"bf"` → **bayes factor** ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/gghistostats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # gghistostats - further customization --- .left-code[ ```r gghistostats( data = movies_long, x = budget, effsize.type = "d", test.value = 50, bar.measure = "mix", # << centrality.para = "median", test.value.line = TRUE, # << normal.curve = TRUE, # << ggtheme = hrbrthemes::theme_ipsum_tw(), ggstatsplot.layer = FALSE, messages = FALSE ) ``` .font80[ Available bar measures - **count** - **proportion** - **both** (of the above) - **density** ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/gghistostats_2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # gghistostats - little code, rich details --- .left-code[ ```r gghistostats( data = movies_long, x = budget, effsize.type = "d", test.value = 50, test.value.size = TRUE, bar.measure = "mix", centrality.para = "median", test.value.line = TRUE, normal.curve = TRUE ) ``` .font80[ Default information: - <font color="orange">statistical details</font> - <font color="blue">Bayes Factor</font> - <font color="green">frequency</font> - <font color="black">distribution summary</font> ] ] .right-plot[  ] --- layout: false class: inverse, center, middle # ggdotplotstats Distribution of a numeric variable with labels --- layout: true # ggdotplotstats - defaults --- .left-code[ ```r ggdotplotstats( data = movies_long, x = budget, y = genre, effsize.type = "d", test.value = 52, # << centrality.para = "median", test.value.line = TRUE, # << ggtheme = ggthemes::theme_par(), messages = FALSE ) ``` .font80[ Changing the type of test - `"p"` → **parametric** - `"np"` → **non-parametric** - `"r"` → **robust** - `"bf"` → **bayes factor** ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggdotplotstats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggdotplotstats - little code, rich details --- .left-code[ ```r ggdotplotstats( data = movies_long, x = budget, y = genre, effsize.type = "d", test.value = 52, centrality.para = "median", test.value.line = TRUE ) ``` .font80[ Default information: - <font color="orange">statistical details</font> - <font color="blue">Bayes Factor</font> - <font color="green">distribution summary</font> ] ] .right-plot[  ] --- exclude: false layout: true # Summary of tests - *gghistostats*/*ggdotplotstats* --- exclude: false Type | Test ------------------ | ------------------------- **<font color="blue">Parametric<font>** | One-sample Student's *t*-test **<font color="#ff6600">Non-parametric<font>** | One-sample Wilcoxon test **<font color="#ff00ff">Robust<font>** | One-sample percentile bootstrap **<font color="#009933">Bayes Factor<font>** | One-sample Student's *t*-test <br> Type | Effect size | CI? ----------- | ------------------------- | --- **<font color="blue">Parametric<font>** | Cohen's *d*, Hedge's *g* (central-and noncentral-*t* distribution based) | <font color="green">Yes<font> **<font color="#ff6600">Non-parametric<font>** | *r* (computed as `\(Z/\sqrt{N_{obs}}\)`) | <font color="green">Yes<font> **<font color="#ff00ff">Robust<font>** | `\(M_{robust}\)` (a robust location measure) | <font color="green">Yes<font> **<font color="#009933">Bayes Factor<font>** | <font color="red">No<font> | <font color="red">No<font> --- layout: false class: inverse, center, middle # ggpiestats For composition of categorical variables --- layout: true # ggpiestats - defaults --- .left-code[ ```r # let's use subset of data ggpiestats( data = dplyr::filter( .data = movies_long, genre %in% c("Drama", "Comedy", "Animated") ), x = genre, y = mpaa, messages = FALSE ) ``` .font80[ Test by design - `paired = FALSE` → Pearson's `\(\chi^2\)` - `paired = TRUE` → McNemar ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggpiestats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggpiestats - little code, rich details --- .left-code[ ```r # let's use subset of data ggpiestats( data = dplyr::filter( movies_long, genre %in% c("Drama", "Comedy", "Animated") ), x = genre, y = mpaa ) ``` .font80[ Default information: - <font color="orange">statistical details</font> - <font color="blue">Bayes Factor</font> - <font color="green">sample sizes</font> - <font color="red">proportion test results</font> ] ] .right-plot[  ] --- layout: true # ggpiestats - proportion test --- .left-code[ ```r ggpiestats( data = as.data.frame(Titanic), x = Survived, # << counts = Freq, # << slice.label = "both", # << messages = FALSE ) ``` .font70[ **Note**: If the data is in *tabled* format, you can use the `counts` argument. Test by analysis - `condition != NULL` → contingency table - `y = = NULL` → goodness of fit ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggpiestats_2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: false class: inverse, center, middle # ggbarstats For composition of categorical variables --- layout: true # ggbarstats - defaults --- .left-code[ ```r ggbarstats( data = movies_long, x = genre, y = mpaa, package = "ggsci", palette = "default_igv", messages = FALSE ) ``` .font70[ **Note**: Even if you display Bayes Factor message in a caption, you can still use the `caption` argument. Label information - **percentage** (default) - **counts** - **both** (of the above) ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggbarstats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggbarstats - little code, rich details --- .left-code[ ```r ggbarstats( data = movies_long, x = genre, y = mpaa, package = "ggsci", palette = "default_igv", messages = FALSE ) ``` .font80[ Default information: - <font color="orange">statistical details</font> - <font color="blue">Bayes Factor</font> - <font color="green">sample sizes</font> - <font color="red">proportion test results</font> ] ] .right-plot[  ] --- exclude: false layout: true # Test summary - *ggpiestats*/*ggbarstats* --- exclude: false **Tests** Data | Design | Test ----------- | ------------ | ------------------------- Unpaired | `\(n \times p\)` contingency table | Pearson's `\(\chi^2\)` test Paired | `\(n \times p\)` contingency table | McNemar's test Frequency | `\(n \times 1\)` contingency table | Goodness of fit **Effect sizes + CI** Test | Effect size | CI? ----------------------- | ----------------------------- | ----- Pearson's `\(\chi^2\)` test | Cramer's `\(V\)` | <font color="green">Yes<font> McNemar's `\(\chi^2\)` test | Cohen's `\(g\)` | <font color="green">Yes<font> Goodness of fit `\(\chi^2\)` test | Cramer's `\(V\)` | <font color="green">Yes<font> --- layout: false class: inverse, center, middle # ggcoefstats Displaying results from regression analyses --- layout: true # ggcoefstats - defaults --- .left-code[ ```r # model mod <- stats::aov( formula = rating ~ mpaa * genre, data = movies_long ) # plot ggcoefstats(mod) ``` .font80[ In addition to `output = "plot"`, this function can also be used to get a <font color="blue">dataframe</font> of results: - `"tidy"` → **estimates** - `"glance"` → **model summary** - `"augment"` → **predictions** ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggcoefstats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggcoefstats - little code, rich details --- .left-code[ ```r # model mod <- stats::aov( formula = rating ~ mpaa * genre, data = movies_long ) # plot ggcoefstats(mod) ``` .font80[ Default information: - <font color="red">estimate + 95% CI</font> - <font color="blue">model summary</font> - <font color="green">statistical details</font> ] ] .right-plot[  ] --- layout: true # *ggcoefstats*: Supported models --- .font80[ .pull-left[ .pull-left[ - `aareg` - `anova` - `aov` - `aovlist` - `Arima` - `bglmerMod` - `biglm` - `blmerMod` - `brmsfit` - `btergm` - `cch` - `clm` - `clmm` - `confusionMatrix` - `coxph` - `data.table` - `drc` - `epi.2by2` - `ergm` ] .pull-right[ - `felm` - `fitdistr` - `glmerMod` - `glmmPQL` - `glmmTMB` - `gls` - `gam` - `Gam` - `gamlss` - `garch` - `glm` - `glmmadmb` - `glmmTMB` - `glmrob` - `glmRob` - `gmm` - `ivreg` ] ] .pull-right[ .pull-left[ - `lm` - `lm.beta` - `lme` - `lmerMod` - `lmodel2` - `lmrob` - `lmRob` - `mclogit` - `mcmc` - `MCMCglmm` - `mediate` - `mjoint` - `mle2` - `mlm` - `mmclogit` - `multinom` - `negbin` - `nlmerMod` - `nlrq` - `nls` ] .pull-right[ - `orcutt` - `plm` - `polr` - `ridgelm` - `rjags` - `rlm` - `rlmerMod` - `rq` - `speedglm` - `speedlm` - `stanreg` - `survreg` - `svyglm` - `svyolr` - `svyglm` - `tobit` - `wblm` ] ] ] --- layout: true # *ggcoefstats*: If not implemented, use a dataframe --- .left-code[ ```r # dataframe with results df <- tibble::tribble( ~term, ~estimate, ~std.error, ~statistic, ~p.value, "(Intercept)", 3.77, 0.165, 22.9, 1.49e-20, "x", -1.36, 0.258, -5.26, 1.13e-5 ) # plot # `statistic` argument decides label format ggcoefstats( x = df, statistic = "z", # << exclude.intercept = FALSE ) ``` .font70[ Supported statistic (for <font color="blue">dataframe</font> objects): - *t* - *z* - *F* At the minimum, two columns needed - <br><font color="blue">term</font> and <font color="blue">estimate</font>. ] ] .right-plot[  ] --- layout: true # *ggcoefstats*: You can also do meta-analysis! --- .pull-left[ ```r # made up data meta_df <- tibble::tribble( ~term, ~estimate, ~std.error, "study_1", 0.111, 0.065, "study_2", -0.003, 0.258, "study_3", 0.001, 0.120, "study_4", 0.032, 0.022, "study_5", -0.765, 0.650, "study_6", -0.032, 0.058 ) # plot ggcoefstats( x = meta_df, meta.analytic.effect = TRUE, # << bf.message = TRUE, # << xlab = "estimate" ) ``` .font80[ - Frequentist random-effects meta-analysis from `metafor` - Bayesian random-effects meta-analysis from `metaBMA` ] ] .pull-right[  ] --- layout: false class: inverse, center, middle # *grouped_* variants of all functions Running the same function for all levels of a single grouping variable --- layout: true # *grouped_* functions You can repeat the same analysis for all levels of a single grouping variable. --- .left-code[ ```r # only one additional argument grouped_ggpiestats( data = mtcars, x = cyl, grouping.var = am, # << results.subtitle = FALSE, # << messages = FALSE ) ``` .font70[ Available `grouped_` variants - `grouped_ggdotplotstats` - `grouped_ggbarstats` - `grouped_ggscatterstats` - `grouped_gghistostats` - `grouped_ggpiestats` - `grouped_ggbetweenstats` - `grouped_ggwithinstats` - `grouped_ggcorrmat` ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/grouped_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: false class: inverse, center, middle # Utility beyond *ggstatsplot* What if I don't like the default plots but still want to display statistical results? --- layout: true # Using as helper functions --- `ggstatsplot` can also be used just to get the statistical details. -- .left-code[ ```r # using `ggstatsplot` for stats results <- ggstatsplot::ggpiestats( data = Titanic_full, x = Survived, y = Sex, output = "subtitle", # << messages = FALSE ) # using `ggiraphExtra` for plot ggiraphExtra::ggSpine( # << data = Titanic_full, aes(x = Sex, fill = Survived), addlabel = TRUE, interactive = FALSE ) + labs(subtitle = results) # << ``` .font70[ <br> **All** included analyses have their corresponding [helper functions](https://indrajeetpatil.github.io/ggstatsplot/reference/index.html#section-helper-functions-for-preparing-statistics-subtitles) for preparing subtitles with statistical details. ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/subtitle_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: false class: inverse, center, middle # Glossary Statistical reporting in *ggstatsplot* --- layout: false # Best practices in reporting statistical details -- - As discussed before, the details included in statistical analyses follow the APA gold standard. - The default tests follow the best practices. For example, `ggbetweenstats` function by default runs <font color="blue">Welch's *t*-test</font> and <font color="blue">Welch's ANOVA</font> - and not Student's *t*-test and Fisher's ANOVA - based on recent work (Delacre et al., [2017](https://www.rips-irsp.com/article/10.5334/irsp.82/), [2018](https://psyarxiv.com/wnezg)). - No *p*-value error <br> (Lilienfeld et al., [2015](https://www.frontiersin.org/articles/10.3389/fpsyg.2015.01100/full)) .img-center-small[  ] --- layout: false # Avoiding errors -- Since the plot and the statistical analysis are yoked together, the chances of making an error in reporting the results are minimized. You never have to write the results manually or copy-paste them from someplace else.  .footnote[[(Nuijten et al., *Behavior Research Methods*, 2016)](https://link.springer.com/article/10.3758/s13428-015-0664-2)] --- layout: false # Making sense of null results -- Combination of frequentist and Bayesian statistics for each analysis to properly interpret the null results.  .footnote[[(Aczel et al., *AMPPS*, 2018)](https://journals.sagepub.com/doi/pdf/10.1177/2515245918773742)] --- layout: false # Toggling between type of statistics -- .pull-left[ **<font color="blue">Parametric<font>** ```r # anova ggbetweenstats( data = mtcars, x = cyl, y = wt, type = "p" # << ) # correlation analysis ggscatterstats( data = mtcars, x = wt, y = mpg, type = "p" # << ) # t-test gghistostats( data = mtcars, x = wt, test.value = 2, type = "p" # << ) ``` ] -- .pull-right[ **<font color="#ff6600">Non-parametric<font>** ```r # anova ggbetweenstats( data = mtcars, x = cyl, y = wt, type = "np" # << ) # correlation analysis ggscatterstats( data = mtcars, x = wt, y = mpg, type = "np" # << ) # t-test gghistostats( data = mtcars, x = wt, test.value = 2, type = "np" # << ) ``` ] --- layout: false class: inverse, center, middle # Glossary Summary of statistical tests included --- layout: false # Types of statistical analyses supported <br> -- Functions | Description | **<font color="blue">Parametric<font>** | **<font color="#ff6600">Non-parametric<font>** | **<font color="#ff00ff">Robust<font>** | **<font color="#009933">Bayes Factor<font>** ------- | ------------------ | ---- | ----- | ----| ----- `ggbetweenstats` | Between group/condition comparisons | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> `ggwithinstats` | Within group/condition comparisons | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> `gghistostats`, `ggdotplotstats` | Distribution of a numeric variable | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> `ggcorrmat` | Correlation matrix | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> `ggscatterstats` | Correlation between two variables | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> `ggpiestats`, `ggbarstats` | Association between categorical variables | <font color="green">Yes</font> | `NA` | `NA` | <font color="green">Yes</font> `ggcoefstats` | Regression model coefficients | <font color="green">Yes</font> | <font color="green">Yes</font>| <font color="green">Yes</font> | <font color="green">Yes</font> --- layout: false # Effect sizes + CI available? <br> -- Test | **<font color="blue">Parametric<font>** | **<font color="#ff6600">Non-parametric<font>** | **<font color="#ff00ff">Robust<font>** | **<font color="#009933">Bayes Factor<font>** ---------------------------- | ------ | ------ | ------ | ------ one-sample *t*-test | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="red">No</font> two-sample *t*-test (between) | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="red">No</font> two-sample *t*-test (within) | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="red">No</font> One-way ANOVA (between) | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="red">No</font> One-way ANOVA (within) | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="red">No</font> | <font color="red">No</font> correlations | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="red">No</font> contingency table | <font color="green">Yes</font> | `NA` | `NA` | <font color="red">No</font> goodness of fit | <font color="green">Yes</font> | `NA` | `NA` | <font color="red">No</font> regression | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> | <font color="green">Yes</font> --- layout: false class: inverse, center, middle # Why use *ggstatsplot*? Summary of benefits --- layout: true # Benefits of using *ggstatsplot* --- -- - Truly makes your figures worth a thousand words. -- - No need to copy-paste results to the text editor (MS-Word, e.g.). -- - Disembodied figures stand on their own and are easy to evaluate for the reader. -- - More breathing room for theoretical discussion and other text. -- - No need to worry about updating figures and statistical details separately if something about the data changes. -- - Minimal amount of code needed for all functions (typically only `data`, `x`, and `y`). This minimizes chances of error. --- layout: true # Example - before and after *ggstatsplot* --- -- .pull-left[ **Before <font color="blue">ggstatsplot</font>**   ] -- .pull-right[ **After <font color="blue">ggstatsplot</font>**  ] --- layout: true # Quality assurance --- -- `ggstatsplot` has 100% code coverage. If you use `ggstatsplot` to report results in your publication and find an error that traces back to code in this package, I will personally write an apology letter to you and to the editor of the journal in question. .footnote[*Reference*: <https://codecov.io/gh/IndrajeetPatil/ggstatsplot>] --- exclude: false layout: true # Informative plot designs --- exclude: false .pull-left[ .font150[ The default plots in *<font color="blue">ggstatsplot</font>* are **opinionated**, yes, but an attempt has been made to make sure that they follow best principles outlined in the data visualization research. ] ] .pull-right[  ] --- layout: true # Documentation --- Exhaustive documentation at the dedicated website- <br> <https://indrajeetpatil.github.io/ggstatsplot/>  --- layout: false class: inverse, center, middle # Limitations --- layout: true # Limitations --- -- - Limited kinds of <font color="blue">plots</font> available. -- - Limited number of statistical <font color="blue">tests</font> (and effect sizes) available. -- - <font color="blue">Faceting</font> (or small multiples) not implemented. -- - Default plots can be too complicated for effectively communicating results in time-constrained presentation settings (e.g., conference talks). -- - <font color="blue">Bulky API</font> (in terms of number of function arguments to keep in mind).<br> (Saving grace: Defaults are sufficient most of the time.) --- layout: true # Overcoming these limitations --- -- .pull-left[ .font90[ Contributions (big or small) welcome! ]  ] -- .pull-right[ .font90[ Ways in which you can contribute - Read and correct any inconsistencies in the [documentation](https://indrajeetpatil.github.io/ggstatsplot/) - Raise issues about bugs or wanted features - Review code - Add new functionality (in the form of new plotting functions or helpers for preparing subtitles) ] ] --- layout: false class: inverse, center, middle # Acknowledgments Contributors to *ggstatsplot* Advisors: [Mina Cikara](http://www.intergroupneurosciencelaboratory.com/), [Fiery Cushman](http://cushmanlab.fas.harvard.edu/index.php), [Iyad Rahwan](https://rahwan.me/) Slides created via the R package [xaringan](https://github.com/yihui/xaringan). The CSS template comes from [Garrick Aden-Buie](https://github.com/gadenbuie/gentle-ggplot2). --- layout: false class: inverse, center, middle # Thanks! .font150.text-white[ [@patilindrajeets](https://twitter.com/patilindrajeets) <br> [github.com/IndrajeetPatil](https://github.com/IndrajeetPatil) <br> patilindrajeet.science@gmail.com ]