Indrajeet Patil
Why a new software?
External Stimulus
“half of all published psychology papers contained at least one p-value that was inconsistent”1
“in 72% of cases, nonsignificant results were misinterpreted [to mean] that effect was absent”2
“39% of effects were subjectively rated to have replicated the original result”3
and more…
Internal Response
How to:
Information-rich, ready-made statistical visualizations
(minimal effort and maximum transparency)
💡 Visualizations reveal problems not discernible from model summaries!
The grammar of graphics framework can prepare any visualization! But building plots from scratch can be time-consuming.
💡 Using ready-made plots lowers the effort needed for visualizing data!
{ggstatsplot}
was born!
(open-sourced on GitHub in 2017; still actively developed)
E.g., for hypothesis about differences between groups
Important
Information-rich defaults
Statistical approaches available
Appendix provides more details.
Does it deliver?
Without {ggstatsplot}
Pearson’s correlation test revealed that, across 142 participants, variable x
was negatively correlated with variable y
: \(t(140)=-0.76, p=0.446\). The effect size \((r=-0.06, 95\% CI [-0.23,0.10])\) was small, as per Cohen’s (1988) conventions. The Bayes Factor for the same analysis revealed that the data were 5.81 times more probable under the null hypothesis as compared to the alternative hypothesis. This can be considered moderate evidence (Jeffreys, 1961) in favor of the null hypothesis (absence of any correlation between x
and y
).
With {ggstatsplot}
✅ No need to worry about reporting or interpretation errors!
Data Visualization
Statistical Reporting
✅ Follows best practices in data visualization and statistical reporting!
I can haz users?!
Total downloads > 500K (97 percentile)
Second most starred {ggplot2}
-extension!
Total citations > 1000
From publications across a wide range of fields:
biology, medicine, psychology, economics, etc.
Maybe the real treasure was the skills we acquired along the way!
Breaking down the monolith: \(20K_{(2017)} \rightarrow 1K_{(2024)}\) lines of code
While re-architecting {ggstatsplot}
, I started contributing upstream.
As part of {easystats}
core team
{ggsignif}
{WRS2}
, {ggcorrplot}
{lintr}
(linter for R){styler}
(code formatter)“The only way to go fast, is to go well.”
- Robert C. Martin
CI Checks (GitHub Actions)
Healthy and active code base
Training material on best practices in software/package development to support community contributions keeping in mind the diverse backgrounds of contributors.
(Or how developing {ggstatsplot}
continues to help me grow as a software developer)
graph LR Project[ggstatsplot] %% Technical Skills Branch Project --> TechSkills[Technical Skills] TechSkills --> CodeQuality[Code Quality] TechSkills --> ArchDesign[Architecture Design] TechSkills --> TechDebt[Technical Debt] %% Soft Skills Branch Project --> SoftSkills[Soft Skills] SoftSkills --> Collab[Collaboration] SoftSkills --> Leadership[Leadership] SoftSkills --> Communication[Communication] %% Styling using colorblind-friendly palette classDef mainNode fill:#FCF596,stroke:#000000,stroke-width:3px classDef broardSkillNode fill:#D0E8C5,stroke:#333,stroke-width:1px classDef skillNode fill:#ffffff,stroke:#333,stroke-width:1px,stroke-dasharray: 5 5 class Project mainNode class TechSkills,SoftSkills broardSkillNode class CodeQuality,ArchDesign,TechDebt,Collab,Leadership,Communication skillNode
{ggstatsplot}
offers an intuitive interface for creating detailed statistical visualizations, enabling users to adopt rigorous, reliable, and robust workflows for data exploration and reporting across various academic and industrial disciplines. It is a well-maintained tool with high-quality infrastructure and widespread adoption.
Source code for these slides can be found on GitHub.
If you are interested in good programming and software development practices, check out my other slide decks.
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.2 (2024-10-31)
os Ubuntu 22.04.5 LTS
system x86_64, linux-gnu
hostname fv-az1345-193
ui X11
language (EN)
collate C.UTF-8
ctype C.UTF-8
tz UTC
date 2024-12-15
pandoc 3.6 @ /opt/hostedtoolcache/pandoc/3.6/x64/ (via rmarkdown)
quarto 1.7.3 @ /usr/local/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
base * 4.4.2 2024-10-31 [3] local
BayesFactor 0.9.12-4.7 2024-01-24 [1] RSPM
bayestestR 0.15.0 2024-10-17 [1] RSPM
bitops 1.0-9 2024-10-03 [1] RSPM
BWStest 0.2.3 2023-10-10 [1] RSPM
cachem 1.1.0 2024-05-16 [1] RSPM
cli 3.6.3 2024-06-21 [1] RSPM
coda 0.19-4.1 2024-01-31 [1] RSPM
colorspace 2.1-1 2024-07-26 [1] RSPM
compiler 4.4.2 2024-10-31 [3] local
correlation 0.8.6 2024-10-26 [1] RSPM
cranlogs 2.1.1 2019-04-29 [1] RSPM
curl 6.0.1 2024-11-14 [1] RSPM
data.table 1.16.4 2024-12-06 [1] RSPM
datasets * 4.4.2 2024-10-31 [3] local
datawizard 0.13.0 2024-10-05 [1] RSPM
digest 0.6.37 2024-08-19 [1] RSPM
dplyr 1.1.4 2023-11-17 [1] RSPM
effectsize 1.0.0 2024-12-10 [1] RSPM
evaluate 1.0.1 2024-10-10 [1] RSPM
fansi 1.0.6 2023-12-08 [1] RSPM
farver 2.1.2 2024-05-13 [1] RSPM
fastmap 1.2.0 2024-05-15 [1] RSPM
generics 0.1.3 2022-07-05 [1] RSPM
ggplot2 * 3.5.1 2024-04-23 [1] RSPM
ggrepel 0.9.6 2024-09-07 [1] RSPM
ggsignif 0.6.4 2022-10-13 [1] RSPM
ggstatsplot * 0.13.0.9000 2024-12-15 [1] Github (IndrajeetPatil/ggstatsplot@1c57eac)
glue 1.8.0 2024-09-30 [1] RSPM
gmp 0.7-5 2024-08-23 [1] RSPM
graphics * 4.4.2 2024-10-31 [3] local
grDevices * 4.4.2 2024-10-31 [3] local
grid 4.4.2 2024-10-31 [3] local
gtable 0.3.6 2024-10-25 [1] RSPM
htmltools 0.5.8.1 2024-04-04 [1] RSPM
httr 1.4.7 2023-08-15 [1] RSPM
insight 1.0.0 2024-11-26 [1] RSPM
jsonlite 1.8.9 2024-09-20 [1] RSPM
knitr 1.49 2024-11-08 [1] RSPM
kSamples 1.2-10 2023-10-07 [1] RSPM
labeling 0.4.3 2023-08-29 [1] RSPM
lattice 0.22-6 2024-03-20 [3] CRAN (R 4.4.2)
lifecycle 1.0.4 2023-11-07 [1] RSPM
lubridate 1.9.4 2024-12-08 [1] RSPM
magrittr 2.0.3 2022-03-30 [1] RSPM
MASS 7.3-61 2024-06-13 [3] CRAN (R 4.4.2)
Matrix 1.7-1 2024-10-18 [3] CRAN (R 4.4.2)
MatrixModels 0.5-3 2023-11-06 [1] RSPM
memoise 2.0.1 2021-11-26 [1] RSPM
methods * 4.4.2 2024-10-31 [3] local
mgcv 1.9-1 2023-12-21 [3] CRAN (R 4.4.2)
multcompView 0.1-10 2024-03-08 [1] RSPM
munsell 0.5.1 2024-04-01 [1] RSPM
mvtnorm 1.3-2 2024-11-04 [1] RSPM
nlme 3.1-166 2024-08-14 [3] CRAN (R 4.4.2)
packageRank * 0.9.4 2024-11-13 [1] RSPM
paletteer 1.6.0 2024-01-21 [1] RSPM
parallel 4.4.2 2024-10-31 [3] local
parameters 0.24.0 2024-11-27 [1] RSPM
patchwork 1.3.0 2024-09-16 [1] RSPM
pbapply 1.7-2 2023-06-27 [1] RSPM
performance 0.12.4 2024-10-18 [1] RSPM
pillar 1.9.0 2023-03-22 [1] RSPM
pkgconfig 2.0.3 2019-09-22 [1] RSPM
pkgsearch 3.1.3 2023-12-10 [1] RSPM
PMCMRplus 1.9.12 2024-09-08 [1] RSPM
prismatic 1.1.2 2024-04-10 [1] RSPM
purrr 1.0.2 2023-08-10 [1] RSPM
R.methodsS3 1.8.2 2022-06-13 [1] RSPM
R.oo 1.27.0 2024-11-01 [1] RSPM
R.utils 2.12.3 2023-11-18 [1] RSPM
R6 2.5.1 2021-08-19 [1] RSPM
Rcpp 1.0.13-1 2024-11-02 [1] RSPM
RcppParallel 5.1.9 2024-08-19 [1] RSPM
RCurl 1.98-1.16 2024-07-11 [1] RSPM
rematch2 2.1.2 2020-05-01 [1] RSPM
rlang 1.1.4 2024-06-04 [1] RSPM
rmarkdown 2.29 2024-11-04 [1] RSPM
Rmpfr 1.0-0 2024-11-18 [1] RSPM
rstantools 2.4.0 2024-01-31 [1] RSPM
scales 1.3.0 2023-11-28 [1] RSPM
sessioninfo 1.2.2.9000 2024-11-10 [1] Github (r-lib/sessioninfo@37c81af)
splines 4.4.2 2024-10-31 [3] local
stats * 4.4.2 2024-10-31 [3] local
statsExpressions 1.6.2.9000 2024-12-15 [1] Github (IndrajeetPatil/statsExpressions@04d47fe)
stringi 1.8.4 2024-05-06 [1] RSPM
stringr 1.5.1 2023-11-14 [1] RSPM
sugrrants 0.2.9 2024-03-12 [1] RSPM
SuppDists 1.1-9.8 2024-09-03 [1] RSPM
tibble 3.2.1 2023-03-20 [1] RSPM
tidyr 1.3.1 2024-01-24 [1] RSPM
tidyselect 1.2.1 2024-03-11 [1] RSPM
timechange 0.3.0 2024-01-18 [1] RSPM
tools 4.4.2 2024-10-31 [3] local
utf8 1.2.4 2023-10-22 [1] RSPM
utils * 4.4.2 2024-10-31 [3] local
vctrs 0.6.5 2023-12-01 [1] RSPM
withr 3.0.2 2024-10-28 [1] RSPM
xfun 0.49 2024-10-31 [1] RSPM
yaml 2.3.10 2024-07-26 [1] RSPM
zeallot 0.1.0 2018-01-28 [1] RSPM
[1] /home/runner/work/_temp/Library
[2] /opt/R/4.4.2/lib/R/site-library
[3] /opt/R/4.4.2/lib/R/library
* ── Packages attached to the search path.
──────────────────────────────────────────────────────────────────────────────
ggwithinstats()
Hypothesis about group differences: repeated measures design
Important
✏️ Defaults
Statistical approaches available
gghistostats()
Distribution of a numeric variable
Important
✏️ Defaults
Statistical approaches available
ggdotplotstats()
Labeled numeric variable
Important
✏️ Defaults
Statistical approaches available
ggscatterstats()
Hypothesis about correlation: Two numeric variables
ggcorrmat()
Hypothesis about correlation: Multiple numeric variables
ggpiestats()
Hypothesis about composition of categorical variables
ggbarstats()
Hypothesis about composition of categorical variables
ggcoefstats()
Hypothesis about regression coefficients
Important
✏️ Defaults
Supports all regression models supported in {easystats}
ecosystem.
Meta-analysis is also supported!
Iterating over a grouping variable
“What if I don’t like the default plots?” 🤔
{ggstatsplot}
: Details about statistical reportingNote
Functions | Description | Parametric | Non-parametric | Robust | Bayesian |
---|---|---|---|---|---|
ggbetweenstats() |
Between group comparisons | ✅ | ✅ | ✅ | ✅ |
ggwithinstats() |
Within group comparisons | ✅ | ✅ | ✅ | ✅ |
gghistostats() , ggdotplotstats() |
Distribution of a numeric variable | ✅ | ✅ | ✅ | ✅ |
ggcorrmat() |
Correlation matrix | ✅ | ✅ | ✅ | ✅ |
ggscatterstats() |
Correlation between two variables | ✅ | ✅ | ✅ | ✅ |
ggpiestats() , ggbarstats() |
Association between categorical variables | ✅ | NA |
NA |
✅ |
ggpiestats() , ggbarstats() |
Equal proportions for categorical variable levels | ✅ | NA |
NA |
✅ |
ggcoefstats() |
Regression modeling | ✅ | ✅ | ✅ | ✅ |
ggcoefstats() |
Random-effects meta-analysis | ✅ | NA |
✅ | ✅ |
Parametric
Hunting for packages
📦 for inferential statistics ({stats}
)
📦 computing effect size + CIs ({effectsize}
)
📦 for descriptive statistics ({skimr}
)
📦 pairwise comparisons ({multcomp}
)
📦 Bayesian hypothesis testing ({BayesFactor}
)
📦 Bayesian estimation ({bayestestR}
)
📦 …
Inconsistent APIs
🤔 accepts data frame, vector, matrix?
🤔 long/wide format data?
🤔 works with NA
s?
🤔 returns data frame, vector, matrix?
🤔 works with tibbles?
🤔 has all necessary details?
🤔 …
{ggstatsplot}
combines data visualization and statistical analysis in a single step.
It…
✅ Quick insight into data by combining visualization and modeling!
11 contributors
3 reverse dependencies
Widely covered in YouTube videos and social media posts
Almost 100% resolution rate on StackOverflow (> 150 questions)
Over 100 daily visitors on GitHub repo
Usage in a wide range of fields: psychology, biology, medicine, economics, etc.
Usage in data science training programs
❌ Promotes mindless application of statistical tests.