The function ggstatsplot::ggscatterstats is meant to provide a publication-ready scatterplot with all statistical details included in the plot itself to show association between two continuous variables. This function is also helpful during the data exploration phase. We will see examples of how to use this function in this vignette with the ggplot2movies dataset.

To begin with, here are some instances where you would want to use ggscatterstats-

  • to check linear association between two continuous variables
  • to check distribution of two continuous variables

Note before: The following demo uses the pipe operator (%>%), so in case you are not familiar with this operator, here is a good explanation: http://r4ds.had.co.nz/pipes.html

Correlation plot with ggscatterstats

To illustrate how this function can be used, we will rely on the ggplot2movies dataset. This dataset provides information about movies scraped from IMDB. Specifically, we will be using cleaned version of this dataset included in the ggstatsplot package itself.

Now that we have a clean dataset, we can start asking some interesting questions. For example, let’s see if the average IMDB rating for a movie has any relationship to its budget. Additionally, let’s also see which movies had a high budget but low IMDB rating by labeling those data points.

To reduce the processing time, let’s only work with 30% of the dataset.

There is indeed a small, but significant, positive correlation between the amount of money studio invests in a movie and the ratings given by the audiences.

The type (of test) argument also accepts the following abbreviations: "p" (for parametric/pearson’s), "np" (for nonparametric/spearman), "r" (for robust).

Important: In contrast to all other functions in this package, the ggscatterstats function returns object that is not further modifiable with ggplot2. This can be avoided by not plotting the marginal distributions (marginal = FALSE). Currently trying to find a workaround this problem.

Using ggscatterstats() in R Notebooks or Rmarkdown

If you try including a ggscatterstats() plot inside an R Notebook or Rmarkdown code chunk, you’ll notice that the plot doesn’t get output. In order to get a ggscatterstats() to show up in an these contexts, you need to save the ggscatterstats plot as a variable in one code chunk, and explicitly print it using the grid package in another chunk, like this:

Grouped analysis with grouped_ggscatterstats

What if we want to do the same analysis do the same analysis for movies with different MPAA (Motion Picture Association of America) film ratings (NC-17, PG, PG-13, R)? In that case, we will have to either write a for loop or use purrr, none of which seem like an exciting prospect.

ggstatsplot provides a special helper function for such instances: grouped_ggstatsplot. This is merely a wrapper function around ggstatsplot::combine_plots. It applies ggstatsplot across all levels of a specified grouping variable and then combines list of individual plots into a single plot. Note that the grouping variable can be anything: conditions in a given study, groups in a study sample, different studies, etc.

Let’s see how we can use this function to apply ggscatterstats for all MPAA ratings. We will be running parametric tests (Pearson’s r, i.e.).
(If you set type = "np" or type = "r", results from non-parametric or robust test will be displayed.)

As seen from the plot, this analysis has revealed something interesting: The relationship we found between budget and IMDB rating holds only for PG-13 and R-rated movies.

Grouped analysis with ggscatterstats + purrr

Although this is a quick and dirty way to explore large amount of data with minimal effort, it does come with an important limitation: reduced flexibility. For example, if we wanted to add, let’s say, a separate type of marginal distribution plot for each MPAA rating or if we wanted to use different types of correlations across different levels of MPAA ratings (NC-17 has only 6 movies, so a robust correlation would be a good idea), this is not possible. But this can be easily done using purrr.

See the associated vignette here: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/purrr_examples.html

Different smoothing methods

Additionally, different smoothing methods can be specified. For example, if a robust correlation (percentage bend correlation coefficient) is used, we can use a robust smoothing function (MASS::rlm). Additionally, we can also specify different formulas to use for smoothing function. It is important that you set results.subtitle = FALSE since the results will no longer be relevant for the smoothing function used. Below, four different examples are given for how to use different smoothing functions.

Suggestions

If you find any bugs or have any suggestions/remarks, please file an issue on GitHub: https://github.com/IndrajeetPatil/ggstatsplot/issues

Session Information

Summarizing session information for reproducibility.

options(width = 200)
devtools::session_info()
#> - Session info ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       America/New_York            
#>  date     2018-10-21                  
#> 
#> - Packages -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
#>  package      * version     date       lib source                                      
#>  assertthat     0.2.0       2017-04-11 [1] CRAN (R 3.5.0)                              
#>  backports      1.1.2       2017-12-13 [1] CRAN (R 3.5.0)                              
#>  base64enc      0.1-3       2015-07-28 [1] CRAN (R 3.5.0)                              
#>  BayesFactor    0.9.12-4.2  2018-05-19 [1] CRAN (R 3.5.0)                              
#>  bayesplot      1.6.0       2018-08-02 [1] CRAN (R 3.5.1)                              
#>  bindr          0.1.1       2018-03-13 [1] CRAN (R 3.5.0)                              
#>  bindrcpp     * 0.2.2       2018-03-29 [1] CRAN (R 3.5.0)                              
#>  boot           1.3-20      2017-08-06 [2] CRAN (R 3.5.1)                              
#>  broom          0.5.0.9001  2018-10-03 [1] Github (tidymodels/broom@140eb58)           
#>  broom.mixed    0.2.2.9000  2018-10-09 [1] Github (bbolker/broom.mixed@fb9163a)        
#>  callr          3.0.0       2018-08-24 [1] CRAN (R 3.5.1)                              
#>  cli            1.0.1       2018-09-25 [1] CRAN (R 3.5.1)                              
#>  cluster        2.0.7-1     2018-04-13 [2] CRAN (R 3.5.1)                              
#>  coda           0.19-2      2018-10-08 [1] CRAN (R 3.5.1)                              
#>  codetools      0.2-15      2016-10-05 [2] CRAN (R 3.5.1)                              
#>  coin           1.2-2       2017-11-28 [1] CRAN (R 3.5.0)                              
#>  colorspace     1.3-2       2016-12-14 [1] CRAN (R 3.5.0)                              
#>  commonmark     1.6         2018-09-30 [1] CRAN (R 3.5.1)                              
#>  cowplot        0.9.99      2018-08-23 [1] Github (wilkelab/cowplot@374c3e9)           
#>  crayon         1.3.4       2018-09-26 [1] Github (r-lib/crayon@3e751fb)               
#>  data.table     1.11.8      2018-09-30 [1] CRAN (R 3.5.1)                              
#>  debugme        1.1.0       2017-10-22 [1] CRAN (R 3.5.0)                              
#>  DEoptimR       1.0-8       2016-11-19 [1] CRAN (R 3.5.0)                              
#>  desc           1.2.0       2018-05-01 [1] CRAN (R 3.5.0)                              
#>  devtools       2.0.0       2018-10-19 [1] CRAN (R 3.5.1)                              
#>  digest         0.6.18      2018-10-10 [1] CRAN (R 3.5.1)                              
#>  dplyr          0.7.7       2018-10-16 [1] CRAN (R 3.5.1)                              
#>  effsize        0.7.1       2017-03-21 [1] CRAN (R 3.5.0)                              
#>  emmeans        1.2.4       2018-09-22 [1] CRAN (R 3.5.1)                              
#>  estimability   1.3         2018-02-11 [1] CRAN (R 3.5.0)                              
#>  evaluate       0.12        2018-10-09 [1] CRAN (R 3.5.1)                              
#>  exact2x2       1.6.3       2018-07-27 [1] CRAN (R 3.5.1)                              
#>  exactci        1.3-3       2017-10-02 [1] CRAN (R 3.5.0)                              
#>  fit.models     0.5-14      2017-04-06 [1] CRAN (R 3.5.0)                              
#>  forcats        0.3.0       2018-02-19 [1] CRAN (R 3.5.0)                              
#>  foreign        0.8-70      2017-11-28 [2] CRAN (R 3.5.1)                              
#>  fs             1.2.6       2018-08-23 [1] CRAN (R 3.5.1)                              
#>  generics       0.0.1.9000  2018-10-03 [1] Github (r-lib/generics@24ba515)             
#>  ggcorrplot     0.1.2       2018-09-11 [1] CRAN (R 3.5.1)                              
#>  ggExtra        0.8         2018-08-14 [1] Github (daattali/ggExtra@76d1618)           
#>  ggplot2        3.0.0.9000  2018-09-26 [1] Github (tidyverse/ggplot2@e9f7ded)          
#>  ggrepel        0.8.0.9000  2018-09-09 [1] Github (slowkow/ggrepel@91877ca)            
#>  ggridges       0.5.1       2018-10-04 [1] Github (clauswilke/ggridges@1d5131f)        
#>  ggstatsplot  * 0.0.6.9000  2018-10-21 [1] local                                       
#>  ggthemes       4.0.1       2018-08-24 [1] CRAN (R 3.5.1)                              
#>  glmmTMB        0.2.2.0     2018-07-03 [1] CRAN (R 3.5.1)                              
#>  glue           1.3.0       2018-09-17 [1] Github (tidyverse/glue@4e74901)             
#>  groupedstats   0.0.3.9000  2018-10-12 [1] local                                       
#>  gtable         0.2.0       2016-02-26 [1] CRAN (R 3.5.0)                              
#>  gtools         3.8.1       2018-06-26 [1] CRAN (R 3.5.0)                              
#>  haven          1.1.2       2018-06-27 [1] CRAN (R 3.5.0)                              
#>  hms            0.4.2       2018-03-10 [1] CRAN (R 3.5.0)                              
#>  htmldeps       0.1.1       2018-09-17 [1] Github (rstudio/htmldeps@c1023e0)           
#>  htmltools      0.3.6       2017-04-28 [1] CRAN (R 3.5.0)                              
#>  httpuv         1.4.5       2018-07-19 [1] CRAN (R 3.5.1)                              
#>  jmv            0.9.4       2018-09-29 [1] Github (jamovi/jmv@7dac133)                 
#>  jmvcore        0.9.4       2018-09-17 [1] CRAN (R 3.5.1)                              
#>  knitr          1.20.19     2018-10-10 [1] Github (yihui/knitr@3abb642)                
#>  labeling       0.3         2014-08-23 [1] CRAN (R 3.5.0)                              
#>  later          0.7.5       2018-09-18 [1] CRAN (R 3.5.1)                              
#>  lattice        0.20-35     2017-03-25 [2] CRAN (R 3.5.1)                              
#>  lazyeval       0.2.1       2017-10-29 [1] CRAN (R 3.5.0)                              
#>  lme4           1.1-18-1    2018-08-17 [1] CRAN (R 3.5.1)                              
#>  magrittr       1.5         2014-11-22 [1] CRAN (R 3.5.0)                              
#>  MASS           7.3-50      2018-04-30 [2] CRAN (R 3.5.1)                              
#>  Matrix         1.2-14      2018-04-13 [2] CRAN (R 3.5.1)                              
#>  MatrixModels   0.4-1       2015-08-22 [1] CRAN (R 3.5.0)                              
#>  mc2d           0.1-18      2017-03-06 [1] CRAN (R 3.5.0)                              
#>  memoise        1.1.0       2017-04-21 [1] CRAN (R 3.5.0)                              
#>  mgcv         * 1.8-24      2018-06-23 [2] CRAN (R 3.5.1)                              
#>  mime           0.6         2018-10-05 [1] CRAN (R 3.5.1)                              
#>  miniUI         0.1.1.1     2018-05-18 [1] CRAN (R 3.5.0)                              
#>  minqa          1.2.4       2014-10-09 [1] CRAN (R 3.5.0)                              
#>  mnormt         1.5-5       2016-10-15 [1] CRAN (R 3.5.0)                              
#>  modelr         0.1.2       2018-05-11 [1] CRAN (R 3.5.0)                              
#>  modeltools     0.2-22      2018-07-16 [1] CRAN (R 3.5.1)                              
#>  multcomp       1.4-8       2017-11-08 [1] CRAN (R 3.5.0)                              
#>  munsell        0.5.0       2018-06-12 [1] CRAN (R 3.5.0)                              
#>  mvtnorm        1.0-8       2018-05-31 [1] CRAN (R 3.5.0)                              
#>  nlme         * 3.1-137     2018-04-07 [2] CRAN (R 3.5.1)                              
#>  nloptr         1.2.1       2018-10-03 [1] CRAN (R 3.5.1)                              
#>  paletteer      0.1.0       2018-07-10 [1] CRAN (R 3.5.1)                              
#>  pbapply        1.3-4       2018-01-10 [1] CRAN (R 3.5.0)                              
#>  pcaPP          1.9-73      2018-01-14 [1] CRAN (R 3.5.0)                              
#>  pillar         1.3.0.9000  2018-10-01 [1] Github (r-lib/pillar@7be5b8a)               
#>  pkgbuild       1.0.2       2018-10-16 [1] CRAN (R 3.5.1)                              
#>  pkgconfig      2.0.2       2018-08-16 [1] CRAN (R 3.5.1)                              
#>  pkgdown        1.1.0.9000  2018-10-21 [1] Github (metrumresearchgroup/pkgdown@3836984)
#>  pkgload        1.0.1       2018-10-11 [1] CRAN (R 3.5.1)                              
#>  plyr           1.8.4       2016-06-08 [1] CRAN (R 3.5.0)                              
#>  prediction     0.3.6       2018-05-22 [1] CRAN (R 3.5.0)                              
#>  prettyunits    1.0.2       2015-07-13 [1] CRAN (R 3.5.0)                              
#>  processx       3.2.0       2018-08-16 [1] CRAN (R 3.5.1)                              
#>  promises       1.0.1       2018-04-13 [1] CRAN (R 3.5.0)                              
#>  ps             1.2.0       2018-10-16 [1] CRAN (R 3.5.1)                              
#>  psych          1.8.9       2018-10-08 [1] local                                       
#>  purrr          0.2.5       2018-05-29 [1] CRAN (R 3.5.0)                              
#>  purrrlyr       0.0.3       2018-05-29 [1] CRAN (R 3.5.0)                              
#>  pwr            1.2-2       2018-03-03 [1] CRAN (R 3.5.0)                              
#>  R6             2.3.0       2018-10-04 [1] CRAN (R 3.5.1)                              
#>  Rcpp           0.12.19     2018-10-01 [1] CRAN (R 3.5.1)                              
#>  remotes        2.0.1       2018-10-19 [1] CRAN (R 3.5.1)                              
#>  reshape        0.8.7       2017-08-06 [1] CRAN (R 3.5.0)                              
#>  reshape2       1.4.3       2017-12-11 [1] CRAN (R 3.5.0)                              
#>  rjson          0.2.20      2018-06-08 [1] CRAN (R 3.5.0)                              
#>  rlang          0.2.99.0000 2018-10-10 [1] Github (r-lib/rlang@fb09ff3)                
#>  rmarkdown      1.10.14     2018-10-10 [1] Github (rstudio/rmarkdown@dcc7d37)          
#>  robust         0.4-18      2017-04-27 [1] CRAN (R 3.5.0)                              
#>  robustbase     0.93-3      2018-09-21 [1] CRAN (R 3.5.1)                              
#>  roxygen2       6.1.0.9000  2018-09-13 [1] Github (klutometis/roxygen@cc34200)         
#>  rprojroot      1.3-2       2018-01-03 [1] CRAN (R 3.5.0)                              
#>  rrcov          1.4-4       2018-05-24 [1] CRAN (R 3.5.0)                              
#>  rstudioapi     0.8         2018-10-02 [1] CRAN (R 3.5.1)                              
#>  sandwich       2.5-0       2018-08-17 [1] CRAN (R 3.5.1)                              
#>  scales         1.0.0       2018-08-09 [1] CRAN (R 3.5.1)                              
#>  sessioninfo    1.1.0       2018-09-25 [1] CRAN (R 3.5.1)                              
#>  shiny          1.1.0       2018-05-17 [1] CRAN (R 3.5.0)                              
#>  sjlabelled     1.0.14      2018-09-12 [1] CRAN (R 3.5.1)                              
#>  sjmisc         2.7.5       2018-09-13 [1] CRAN (R 3.5.1)                              
#>  sjstats        0.17.1      2018-10-04 [1] Github (strengejacke/sjstats@9eafc5e)       
#>  skimr          1.0.3       2018-07-06 [1] Github (ropenscilabs/skimr@c67559a)         
#>  snakecase      0.9.2       2018-08-14 [1] CRAN (R 3.5.1)                              
#>  ssanv          1.1         2015-06-23 [1] CRAN (R 3.5.0)                              
#>  stringdist     0.9.5.1     2018-06-08 [1] CRAN (R 3.5.0)                              
#>  stringi        1.2.4       2018-07-20 [1] CRAN (R 3.5.1)                              
#>  stringr        1.3.1       2018-05-10 [1] CRAN (R 3.5.0)                              
#>  survival       2.42-3      2018-04-16 [2] CRAN (R 3.5.1)                              
#>  testthat       2.0.1       2018-10-13 [1] CRAN (R 3.5.1)                              
#>  TH.data        1.0-9       2018-07-10 [1] CRAN (R 3.5.1)                              
#>  tibble         1.4.2       2018-01-22 [1] CRAN (R 3.5.1)                              
#>  tidyr          0.8.1.9000  2018-10-06 [1] Github (tidyverse/tidyr@3b62137)            
#>  tidyselect     0.2.5       2018-10-11 [1] CRAN (R 3.5.1)                              
#>  TMB            1.7.14      2018-06-23 [1] CRAN (R 3.5.0)                              
#>  usethis        1.4.0.9000  2018-09-23 [1] Github (r-lib/usethis@1e3c6a6)              
#>  withr          2.1.2       2018-03-15 [1] CRAN (R 3.5.0)                              
#>  WRS2           0.10-0      2018-06-15 [1] CRAN (R 3.5.0)                              
#>  xfun           0.3         2018-07-06 [1] CRAN (R 3.5.1)                              
#>  xml2           1.2.0       2018-01-24 [1] CRAN (R 3.5.0)                              
#>  xtable         1.8-3       2018-08-29 [1] CRAN (R 3.5.1)                              
#>  yaml           2.2.0       2018-07-25 [1] CRAN (R 3.5.1)                              
#>  zoo            1.8-4       2018-09-19 [1] CRAN (R 3.5.1)                              
#> 
#> [1] C:/Users/inp099/Documents/R/win-library/3.5
#> [2] C:/Program Files/R/R-3.5.1/library