class: center, middle, inverse, title-slide .title[ # Introduction to
{ggstatsplot}
:
{ggplot2}
Plots with Statistics ] .author[ ### Indrajeet Patil ] --- <style type="text/css"> body, td { font-size: 16px; } code.r{ font-size: 14px; } </style> --- layout: true # Plan --- - Why `ggstatsplot`? - Primary functions - Customizability - Benefits - Misconceptions - Limitations --- layout: false class: inverse, center, middle # Why *ggstatsplot*? --- layout: true class: center # Raison d'Γͺtre --- -- .right-column[ .font120[ Current count of packages on the Comprehensive R Archive Network (`CRAN`) **> 21,000** ] ] -- .left-column[ ![](images/y_tho.jpg) ] -- .right-column[.font110[ .content-box-yellow[ In short, `ggstatsplot` returns <br> <br> π .blue[information-rich] plots with .blue[statistical details], which are <br> π suitable for .blue[faster] (exploratory) data analysis and scholarly reports ] ] ] --- layout: true class: center # Simpler/faster data analysis workflow --- -- .img-center[ ![](images/ds_workflow.png) ] .footnote[[(Grolemund & Wickham, *R for Data Science*, 2017)](https://r4ds.had.co.nz/)] -- <br> <br> <br> <br> <br> <br> <br> <br> In a typical *exploratory* data analysis workflow, .blue[data visualization] and .blue[statistical modeling] are two different phases: visualization informs modeling, and modeling can suggest a different visualization, and so on and so forth. -- π‘ The central idea of `ggstatsplot` is simple: combine these two phases into one! --- layout: true class: center # Information-rich graphic is worth a thousand words --- .img-center[ ![](images/datasaurus.gif) ] .footnote[[(Matejka & Fitzmaurice, *Autodesk Research*, 2017)](https://www.autodeskresearch.com/publications/samestats)] <br> <br> <br> <br> <br> <br> <br> <br> <br> .blue[Graphical] summaries can reveal problems not visible from .blue[numerical] statistics. --- layout: false class: center # Ready-made plot = no customization -- The .blue[grammar of graphics] is a powerful framework [(Wilkinson, 2011)](https://www.google.com/books/edition/_/iI1kcgAACAAJ?hl=en&sa=X&ved=2ahUKEwiGl8rJ2KztAhWyElkFHa8NAvkQre8FMBR6BAgMEAc) and can help you make *any* graphics fitting your specific data visualization needs! But... -- .pull-left[ ![](images/power.jpg) ] .pull-right[ ![](images/cat_trademill.gif) ] --- layout: false class: inverse, center, middle # And a LOT more! ...but we will come back to that later π Let's get started first! --- layout: false # Installation -- Install the stable version of `ggstatsplot` from [CRAN](https://cran.r-project.org/web/packages/ggstatsplot/index.html): ``` r install.packages("ggstatsplot") ``` -- You can get the development version of the package from [Github](https://github.com/IndrajeetPatil/ggstatsplot): ``` r remotes::install_github("IndrajeetPatil/ggstatsplot") ``` -- Load the needed packages- ``` r library(ggstatsplot) library(ggplot2) ``` --- layout: false class: inverse, center, middle # Primary functions --- layout: false class: inverse, center, middle # Hypothesis about group differences --- layout: true # ggbetweenstats - For between group comparisons --- -- .left-code[ ``` r ggbetweenstats( data = movies_long, x = mpaa, y = rating ) ``` .font70[ Function internally decides tests - *t*-test if `2` groups - ANOVA if `> 2` groups βοΈ .blue[Defaults] return <br> β raw data + distributions <br> β descriptive statistics <br> β inferential statistics <br> β effect size + CIs <br> β pairwise comparisons <br> β Bayesian hypothesis-testing <br> β Bayesian estimation ] ] -- .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggbetweenstats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggwithinstats - repeated measures equivalent --- -- .left-code[ ``` r ggwithinstats( data = WRS2::WineTasting, x = Wine, y = Taste ) ``` .font70[ βοΈ .blue[Defaults] return <br> β raw data + distributions <br> β descriptive statistics <br> β inferential statistics <br> β effect size + CIs <br> β pairwise comparisons <br> β Bayesian hypothesis-testing <br> β Bayesian estimation <br> Changing the `type` of test β `"p"` β **parametric** <br> β `"np"` β **non-parametric** <br> β `"r"` β **robust** <br> β `"bf"` β **Bayesian** ] ] -- .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggwithinstats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # gghistostats - Distribution of a numeric variable --- -- .left-code[ ``` r gghistostats( data = movies_long, x = budget, * test.value = 30 ) ``` .font70[ βοΈ .blue[Defaults] return <br> β counts + proportion for bins <br> β descriptive statistics <br> β inferential statistics <br> β effect size + CIs <br> β Bayesian hypothesis-testing <br> β Bayesian estimation <br> Changing the `type` of test β `"p"` β **parametric** <br> β `"np"` β **non-parametric** <br> β `"r"` β **robust** <br> β `"bf"` β **Bayesian** ] ] -- .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/gghistostats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggdotplotstats - Labeled numeric variable --- -- .left-code[ ``` r ggdotplotstats( data = movies_long, x = budget, y = genre, * test.value = 30 ) ``` .font70[ βοΈ .blue[Defaults] return <br> β descriptive statistics <br> β inferential statistics <br> β effect size + CIs <br> β Bayesian hypothesis-testing <br> β Bayesian estimation <br> Changing the `type` of test β `"p"` β **parametric** <br> β `"np"` β **non-parametric** <br> β `"r"` β **robust** <br> β `"bf"` β **Bayesian** ] ] -- .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggdotplotstats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: false class: inverse, center, middle # Hypothesis about correlation --- layout: true # ggscatterstats - Two numeric variables --- -- .left-code[ ``` r ggscatterstats( data = movies_long, x = budget, y = rating ) ``` .font70[ βοΈ .blue[Defaults] return <br> β joint distribution <br> β marginal distributions <br> β inferential statistics <br> β effect size + CIs <br> β Bayesian hypothesis-testing <br> β Bayesian estimation <br> Changing the `type` of test β `"p"` β **parametric** <br> β `"np"` β **non-parametric** <br> β `"r"` β **robust** <br> β `"bf"` β **Bayesian** ] ] -- .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggscatterstats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggscatterstats - conditional point tagging --- -- .left-code[ ``` r ggscatterstats( data = movies_long, x = budget, y = rating, type = "r", * label.var = title, * label.expression = budget > 150 * & rating > 7.5 ) ``` ] -- .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggscatterstats_2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggcorrmat - multiple numeric variables --- -- .left-code[ ``` r ggcorrmat(dplyr::starwars) ``` .font70[ βοΈ .blue[Defaults] return <br> β effect size + significance <br> β careful handling of `NA`s Changing the `type` of test β `"p"` β **parametric** <br> β `"np"` β **non-parametric** <br> β `"r"` β **robust** <br> β `"bf"` β **Bayesian** Partial correlations are also supported! Just set `partial=TRUE`. ] ] -- .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggcorrmat_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: false class: inverse, center, middle # Hypothesis of composition of categorical variables --- layout: true # ggpiestats - association between categorical variables --- -- .left-code[ ``` r ggpiestats( data = dplyr::filter( movies_long, genre %in% c("Drama", "Comedy") ), x = mpaa, y = genre ) ``` .font70[ βοΈ .blue[Defaults] return <br> β descriptive statistics <br> β inferential statistics <br> β effect size + CIs <br> β Goodness-of-fit tests <br> β Bayesian hypothesis-testing <br> β Bayesian estimation <br> ] ] -- .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggpiestats_2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # ggbarstats - association between categorical variables --- -- .left-code[ ``` r ggbarstats( data = dplyr::filter( movies_long, genre %in% c("Drama", "Comedy") ), x = mpaa, y = genre, * label = "both" ) ``` .font70[ βοΈ .blue[Defaults] return <br> β descriptive statistics <br> β inferential statistics <br> β effect size + CIs <br> β Goodness-of-fit tests <br> β Bayesian hypothesis-testing <br> β Bayesian estimation <br> ] ] -- .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggbarstats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: false class: inverse, center, middle # Hypothesis about regression coefficients --- layout: true # ggcoefstats --- -- .left-code[ ``` r # model mod <- lm( formula = rating ~ mpaa, data = movies_long ) # plot ggcoefstats(mod) ``` .font70[ βοΈ .blue[Defaults] return <br> β estimate + CIs <br> β inferential statistics ($t$, `\(z\)`, `\(F\)`, `\(\chi^2\)`) <br> β model fit indices (AIC + BIC) Supports all regression models supported in [`{easystats}`](https://easystats.github.io/insight/reference/is_model_supported.html) ecosystem. ] ] -- .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggcoefstats_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: false class: inverse, center, middle # *grouped_* variants of all functions Running the same function for all levels of a single grouping variable --- layout: true # *grouped_* functions --- -- .left-code[ ``` r grouped_ggpiestats( data = mtcars, x = cyl, * grouping.var = am ) ``` .font70[ Available `grouped_` variants - `grouped_ggbetweenstats` - `grouped_ggwithinstats` - `grouped_gghistostats` - `grouped_ggdotplotstats` - `grouped_ggscatterstats` - `grouped_ggcorrmat` - `grouped_ggpiestats` - `grouped_ggbarstats` ] ] -- .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/grouped_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: false class: inverse, center, middle # Customizability of *ggstatsplot* "What if I don't like the default plots?" π€ --- layout: true # Changing aesthetics (themes + palettes) π¨ --- Aesthetic preferences not an excuse to avoid `ggstatsplot`! π» .left-code[ ``` r ggbetweenstats( data = movies_long, x = mpaa, y = rating, * ggtheme = ggthemes::theme_economist(), * palette = "Darjeeling2", * package = "wesanderson" ) ``` .font70[ The default palette is .blue[colorblind-friendly]. ] ] -- .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggbetweenstats_4-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # Further modification with *ggplot2* π --- You can modify `ggstatsplot` plots further using `ggplot2` functions. π -- .left-code[ ``` r ggbetweenstats( data = mtcars, x = am, y = wt, type = "bayes" ) + * scale_y_continuous(sec.axis = dup_axis()) ``` .img-left-small[ ![](images/happy_cat.gif) ] ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/ggbetweenstats_5-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # Too much information π --- `ggstatsplot` can be used to get .blue[only plots]. -- .left-code[ ``` r ggbetweenstats( data = iris, x = Species, y = Sepal.Length, # turn off centrality measure * centrality.plotting = FALSE, # turn off statistical analysis * results.subtitle = FALSE, # turn off Bayesian message * bf.message = FALSE, # turn off pairwise comparisons * pairwise.display = "none" ) ``` ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/only_plot-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # Expressions for custom plots ποΈ --- `ggstatsplot` can be used to get .blue[only expressions]. -- .left-code[ ``` r results <- ggpiestats( data = Titanic_full, x = Survived, y = Sex, * output = "subtitle" ) *ggiraphExtra::ggSpine( data = Titanic_full, aes(x = Sex, fill = Survived), addlabel = TRUE, interactive = FALSE ) + * labs(subtitle = results) ``` ] .right-plot[ <img src="ggstatsplot_presentation_files/figure-html/subtitle_1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- layout: true # Dataframes --- [`statsExpressions`](https://indrajeetpatil.github.io/statsExpressions/), statistical processing backend for `ggstatsplot`, can provide .blue[dataframes]. -- .pull-left[ <img src="images/statsExpressions.png" alt="drawing" style="width:350px;"/> ] .pull-right[ ``` r library(statsExpressions) # for example one_sample_test( data = mtcars, x = wt, test.value = 3 ) ``` ] --- layout: false class: inverse, center, middle # Why use *ggstatsplot*? ποΈ --- layout: false # Supports different statistical approaches -- Functions | Description | Parametric | Non-parametric | Robust | Bayesian ------- | ------------------ | ---- | ----- | ----| ----- `ggbetweenstats` | Between group comparisons | β | β | β | β `ggwithinstats` | Within group comparisons | β | β | β | β `gghistostats`, `ggdotplotstats` | Distribution of a numeric variable | β | β | β | β `ggcorrmat` | Correlation matrix | β | β | β | β `ggscatterstats` | Correlation between two variables | β | β | β | β `ggpiestats`, `ggbarstats` | Association between categorical variables | β | `NA` | `NA` | β `ggpiestats`, `ggbarstats` | Equal proportions for categorical variable levels | β | `NA` | `NA` | β `ggcoefstats` | Regression modeling | β | β | β | β `ggcoefstats` | Random-effects meta-analysis | β | `NA` | β | β --- layout: false # Toggling between statistical approaches π -- .pull-left[ **.blue[Parametric]** ``` r # anova ggbetweenstats( data = mtcars, x = cyl, y = wt, * type = "p" ) # correlation analysis ggscatterstats( data = mtcars, x = wt, y = mpg, * type = "p" ) # t-test gghistostats( data = mtcars, x = wt, test.value = 2, * type = "p" ) ``` ] -- .pull-right[ **.orange[Non-parametric]** ``` r # anova ggbetweenstats( data = mtcars, x = cyl, y = wt, * type = "np" ) # correlation analysis ggscatterstats( data = mtcars, x = wt, y = mpg, * type = "np" ) # t-test gghistostats( data = mtcars, x = wt, test.value = 2, * type = "np" ) ``` ] --- layout: false # Alternative workflow to the following -- .pull-left[ .font90[ .blue[Load 'em up!] π¦ for inferential statistics (e.g. `stats`) <br> π¦ computing effect size + CIs (e.g. `effectsize`) <br> π¦ for descriptives (e.g. `skimr`) <br> π¦ pairwise comparisons (e.g. `multcomp`) <br> π¦ Bayesian hypothesis testing (e.g. `BayesFactor`) <br> π¦ Bayesian estimation (e.g. `bayestestR`) <br> π¦ . <br> ] .img-left-small[ ![](images/packages.gif) ] ] -- .pull-right[ .font90[ .blue[Things to worry about] π€ π€ accepts dataframe, vectors, matrix? <br> π€ long/wide format data? <br> π€ works with `NA`s? <br> π€ returns list, dataframe, arrays? <br> π€ works with tibbles? <br> π€ has all necessary details? <br> π€ . <br> .img-right-small[ ![](images/monkey.gif) ] ] ] --- layout: false # Results *in context* of the underlying data π΅οΈ -- .pull-left[ **Standard approach** Pearson's correlation test revealed that, across 142 participants, variable `x` was negatively correlated with variable `y`: `\(t(140)=-0.76, p=.446\)`. The effect size `\((r=-0.06, 95\% CI [-.23,.10])\)` was small, as per Cohenβs (1988) conventions. The Bayes Factor for the same analysis revealed that the data were 5.81 times more probable under the null hypothesis as compared to the alternative hypothesis. This can be considered moderate evidence (Jeffreys, 1961) in favor of the null hypothesis (absence of any correlation between `x` and `y`). ] -- .pull-right[ **`ggstatsplot` approach** ![](images/after_ggstats.PNG) ] --- --- layout: false # Best practices in statistical reporting π -- ![](images/stats_reporting_format.png) --- layout: false # Avoiding reporting errors -- .content-box-green[ "half of all published psychology papers that use NHST contained at least one *p*-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent *p*-value that may have affected the statistical conclusion" [(Nuijten et al., *Behavior Research Methods*, 2016)](https://link.springer.com/article/10.3758/s13428-015-0664-2) ] -- Since the plot and the statistical analysis are yoked together, the chances of making an error in reporting the results are minimized. -- No need to worry about updating figures and statistical details **separately**. π --- layout: false # Making sense of null results -- `\(p > 0.05\)`: The null hypothesis (`H0`) can't be rejected But can it be **accepted**?! Null Hypothesis Significance Testing π€« -- .content-box-green[ "In 72% of cases, nonsignificant results were misinterpreted, in that the authors inferred that the effect was absent. A Bayesian reanalysis revealed that fewer than 5% of the nonsignificant findings provided strong evidence (i.e., `\(BF_{01} > 10\)`) in favor of the null hypothesis over the alternative hypothesis." [(Aczel et al., *AMPPS*, 2018)](https://journals.sagepub.com/doi/pdf/10.1177/2515245918773742) ] -- Juxtaposing frequentist and Bayesian statistics for the same analysis helps to properly interpret the null results. --- layout: true # A few other benefits --- -- .content-box-green[ Minimal code needed (`data`, `x`, `y`): minimizes chances of error + tidy scripts. π ] -- .content-box-green[ Disembodied figures stand on their own and are easy to evaluate. π§ ] -- .content-box-green[ More breathing room for theoretical discussion and other text. β ] --- layout: true class: center # No more excuses not to explore data! π --- .content-box-yellow[ In summary, the `ggstatsplot` approach- <br> <br> (*a*) avoids errors in statistical reporting, <br> <br> (*b*) highlights the importance of the effect by providing effect size measures by default, <br> <br> (*c*) provides an easy way to evaluate *absence* of an effect using Bayesian framework, <br> <br> (*d*) demands to evaluate statistical analysis in the context of the underlying data, <br> <br> and is (*e*) easy and (*f*) simple enough that somebody with little coding experience can use it without making an error. ] --- layout: false class: inverse, center, middle # Misconceptions and limitations --- layout: true # Misconceptions: This package is... --- -- β an alternative to learning `ggplot2` <br> -- β (the more you know `ggplot2`, the better you can modify the defaults to your liking) -- β meant to be used in talks/presentations <br> -- β (defaults too complicated for effectively communicating results in time-constrained presentation settings, e.g. conference talks) -- β only relevant when used in publications <br> -- β not necessary; can also be useful *only* during exploratory phase -- β the only game in town <br> -- β (excellent GUI open-source softwares: [JASP](https://jasp-stats.org/) and [jamovi](https://www.jamovi.org/)) --- layout: true # Limitations of *ggstatsplot* ποΈ --- -- .content-box-red[ Limited no. of **plots** and **statistical tests** available. This will **always** be the case. π€· ] -- .content-box-red[ Expects a non-trivial level of statistical proficiency (but plots without statistics can still be useful). ] -- .content-box-red[ **Faceting** does not work (since there are no corresponding `geom_` s). For the same reason, plots are not `{gganimate}`-friendly. ] --- layout: true # Overcoming these limitations π₯ --- -- .pull-left[ Contributions (big or small) welcome! ![](images/needs_you.jpg) ] -- .pull-right[ Ways in which you can [contribute](https://github.com/IndrajeetPatil/ggstatsplot) .content-box-purple[ - Star on GitHub (increases visibility) β - Cite if used in a publication π - Proof-read the documentation π - Raise issues about bugs/features π - Review code π΅ - Add new functionality π¨βπ» ] ] --- layout: false class: inverse, center, middle # Acknowledgments -- Developer friends π [Daniel LΓΌdecke](https://github.com/strengejacke), [Dominique Makowski](https://github.com/DominiqueMakowski), [Mattan S. Ben-Shachar](https://github.com/mattansb), [Brenton M. Wiernik](https://github.com/bwiernik) -- Support π° [Mina Cikara](http://www.intergroupneurosciencelaboratory.com/), [Fiery Cushman](http://cushmanlab.fas.harvard.edu/index.php), [Iyad Rahwan](https://rahwan.me/) -- Community π Contributors to *ggstatsplot* & *rstats* users and developers --- layout: false class: inverse, center, middle # More documentation .font100[ π [Publication](https://joss.theoj.org/papers/10.21105/joss.03167) ποΈ [Website](https://indrajeetpatil.github.io/ggstatsplot/) π₯ [Yury Zablotski's YouTube playlist on *ggstatsplot*](https://www.youtube.com/playlist?list=PLPWcjtBkAf6kI13vCpRm08zRarRwIiZ9U) ] --- layout: false class: inverse, center, middle # For more .font100[ If you are interested in good programming and software development practices, check out my other [slide decks](https://sites.google.com/site/indrajeetspatilmorality/presentations). ] --- layout: false class: inverse, center, middle # Find me at... .font100[ [π¦ @patilindrajeets](http://twitter.com/patilindrajeets) [π» @IndrajeetPatil](http://github.com/IndrajeetPatil) [π https://sites.google.com/site/indrajeetspatilmorality/](https://sites.google.com/site/indrajeetspatilmorality/) [π§ patilindrajeet.science@gmail.com](mailto:patilindrajeet.science@gmail.com) ] --- layout: false class: inverse, center, middle # The End π To access code for these slides, see- <https://github.com/IndrajeetPatil/ggstatsplot_slides/>