Dot-and-whisker plots for regression analyses

ggcoefstats(
  x,
  output = "plot",
  statistic = NULL,
  scales = NULL,
  component = "survival",
  bf.message = TRUE,
  d = "norm",
  d.par = c(mean = 0, sd = 0.3),
  tau = "halfcauchy",
  tau.par = c(scale = 0.5),
  iter = 5000,
  summarize = "stan",
  p.adjust.method = "none",
  coefficient.type = c("beta", "location", "coefficient"),
  by.class = FALSE,
  effsize = "eta",
  partial = TRUE,
  nboot = 500,
  meta.analytic.effect = FALSE,
  point.color = "blue",
  point.size = 3,
  point.shape = 16,
  conf.int = TRUE,
  conf.level = 0.95,
  se.type = "nid",
  k = 2,
  k.caption.summary = 0,
  exclude.intercept = TRUE,
  exponentiate = FALSE,
  errorbar.color = "black",
  errorbar.height = 0,
  errorbar.linetype = "solid",
  errorbar.size = 0.5,
  vline = TRUE,
  vline.color = "black",
  vline.linetype = "dashed",
  vline.size = 1,
  sort = "none",
  xlab = "regression coefficient",
  ylab = "term",
  title = NULL,
  subtitle = NULL,
  stats.labels = TRUE,
  only.significant = FALSE,
  caption = NULL,
  caption.summary = TRUE,
  stats.label.color = NULL,
  stats.label.args = list(size = 3, fontface = "bold", segment.color = "grey50",
    direction = "y"),
  package = "RColorBrewer",
  palette = "Dark2",
  direction = 1,
  ggtheme = ggplot2::theme_bw(),
  ggstatsplot.layer = TRUE,
  messages = FALSE,
  return = NULL,
  ...
)

Arguments

x

A model object to be tidied with broom::tidy, or a tidy data frame containing results. If a data frame is to be plotted, it must contain columns named term (names of predictors), or estimate (corresponding estimates of coefficients or other quantities of interest). Other optional columns are conf.low and conf.high (for confidence intervals); p.value. It is important that all term names should be unique.

output, return

Character describing the expected output from this function: "plot" (visualization of regression coefficients) or "tidy" (tidy dataframe of results from broom::tidy) or "glance" (object from broom::glance) or "augment" (object from broom::augment).

statistic

Which statistic is to be displayed (either "t" or "f"or "z") in the label. This is especially important if the x argument in ggcoefstats is a dataframe in which case the function wouldn't know what kind of model it is dealing with.

scales

scales on which to report the variables: for random effects, the choices are ‘"sdcor"’ (standard deviations and correlations: the default if scales is NULL) or ‘"vcov"’ (variances and covariances). NA means no transformation, appropriate e.g. for fixed effects.

component

Character specifying whether to tidy the survival or the longitudinal component of the model. Must be either "survival" or "longitudinal". Defaults to "survival".

bf.message

Logical that decides whether results from running a Bayesian meta-analysis assuming that the effect size d varies across studies with standard deviation t (i.e., a random-effects analysis) should be displayed in caption. Defaults to TRUE.

d

the prior distribution of the average effect size \(d\) specified either as the type of family (e.g., "norm") or via prior.

d.par

prior parameters for \(d\) (only used if d specifies the type of family).

tau

the prior distribution of the between-study heterogeneity \(\tau\) specified either as a character value (e.g., "halfcauchy") or via prior.

tau.par

prior parameters for \(\tau\) (only used if tau specifies the type of family).

iter

number of MCMC iterations using Stan.

summarize

how to estimate parameter summaries (mean, median, SD, etc.): Either by numerical integration (summarize = "integrate") or based on MCMC/Stan samples (summarize = "stan").

p.adjust.method

Adjustment method for p-values for multiple comparisons. Possible methods are: "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". Default is no correction ("none"). This argument is relevant for multiplicity correction for multiway ANOVA designs (see, Cramer et al., 2015).

coefficient.type

Relevant only for ordinal regression models (clm , clmm, "svyolr", and polr), this argument decides which parameters are display in the plot. Available parameters are: parameter that measures the intercept, i.e. the log-odds distance between response values ("alpha"); effects on the location ("beta"); or effects on the scale ("zeta"). For clm and clmm models, by default, only "beta" (a vector of regression parameters) parameters will be show. Other options are "alpha" (a vector of threshold parameters) or "both". For polr models, by default, only "coefficient" will be shown. Other option is to show "zeta" parameters. Note that, from broom 0.7.0 onward, coefficients will be renamed and "intercept" type coefficients will correspond to "alpha" parameters, "location" type coefficients will correspond to "beta" parameters, and "scale" type coefficients will correspond to "zeta" parameters.

by.class

A logical indicating whether or not to show performance measures broken down by class. Defaults to FALSE. When by.class = FALSE only returns a tibble with accuracy and kappa statistics. Mostly relevant for an object of class "confusionMatrix".

effsize

Character describing the effect size to be displayed: "eta" (default) or "omega". This argument is relevant only for models objects of class aov, anova, and aovlist.

partial

Logical that decides if partial eta-squared or omega-squared are returned (Default: TRUE). If FALSE, eta-squared or omega-squared will be returned. Valid only for objects of class aov, anova, or aovlist.

nboot

Number of bootstrap samples for confidence intervals for partial eta-squared and omega-squared (Default: 500). This argument is relevant only for models objects of class aov, anova, and aovlist.

meta.analytic.effect

Logical that decides whether subtitle for meta-analysis via linear (mixed-effects) models - as implemented in the metafor package - is to be displayed (default: FALSE). If TRUE, input to argument subtitle will be ignored. This will be mostly relevant if a data frame with estimates and their standard errors is entered as input to x argument.

point.color

Character describing color for the point (Default: "blue").

point.size

Numeric specifying size for the point (Default: 3).

point.shape

Numeric specifying shape to draw the points (Default: 16 (a dot)).

conf.int

Logical. Decides whether to display confidence intervals as error bars (Default: TRUE).

conf.level

Numeric deciding level of confidence intervals (Default: 0.95). For MCMC model objects (Stan, JAGS, etc.), this will be probability level for CI.

se.type

Character specifying the method used to compute standard standard errors for quantile regression (Default: "nid"). To see all available methods, see quantreg::summary.rq().

k

Number of decimal places expected for results displayed in labels (Default : k = 2).

k.caption.summary

Number of decimal places expected for results displayed in captions (Default : k.caption.summary = 0).

exclude.intercept

Logical that decides whether the intercept should be excluded from the plot (Default: TRUE).

exponentiate

If TRUE, the x-axis will be logarithmic (Default: FALSE).

errorbar.color

Character deciding color of the error bars (Default: "black").

errorbar.height

Numeric specifying the height of the error bars (Default: 0).

errorbar.linetype

Line type of the error bars (Default: "solid").

errorbar.size

Numeric specifying the size of the error bars (Default: 0.5).

vline

Decides whether to display a vertical line (Default: "TRUE").

vline.color

Character specifying color of the vertical line (Default: "black").

vline.linetype

Character specifying line type of the vertical line (Default: "dashed").

vline.size

Numeric specifying the size of the vertical line (Default: 1).

sort

If "none" (default) do not sort, "ascending" sort by increasing coefficient value, or "descending" sort by decreasing coefficient value.

xlab

Label for x axis variable (Default: "regression coefficient").

ylab

Label for y axis variable (Default: "term").

title

The text for the plot title.

subtitle

The text for the plot subtitle. The input to this argument will be ignored if meta.analytic.effect is set to TRUE.

stats.labels

Logical. Decides whether the statistic and p-values for each coefficient are to be attached to each dot as a text label using ggrepel (Default: TRUE).

only.significant

If TRUE, only stats labels for significant effects is shown (Default: FALSE). This can be helpful when a large number of regression coefficients are to be displayed in a single plot. Relevant only when the output is a plot.

caption

The text for the plot caption.

caption.summary

Logical. Decides whether the model summary should be displayed as a cation to the plot (Default: TRUE). Color of the line segment. Defaults to the same color as the text.

stats.label.color

Color for the labels. If stats.label.color is NULL, colors will be chosen from the specified package (Default: "RColorBrewer") and palette (Default: "Dark2").

stats.label.args

Additional arguments that will be passed to ggrepel geom_label_repel geom. Please see documentation for that function to know more about these arguments.

package

Name of package from which the palette is desired as string or symbol.

palette

Name of palette as string or symbol.

direction

Either 1 or -1. If -1 the palette will be reversed.

ggtheme

A function, ggplot2 theme name. Default value is ggplot2::theme_bw(). Any of the ggplot2 themes, or themes from extension packages are allowed (e.g., ggthemes::theme_fivethirtyeight(), hrbrthemes::theme_ipsum_ps(), etc.).

ggstatsplot.layer

Logical that decides whether theme_ggstatsplot theme elements are to be displayed along with the selected ggtheme (Default: TRUE). theme_ggstatsplot is an opinionated theme layer that override some aspects of the selected ggtheme.

messages

Decides whether messages references, notes, and warnings are to be displayed (Default: TRUE).

...

Additional arguments to tidying method.

Value

Plot with the regression coefficients' point estimates as dots with confidence interval whiskers and other statistical details included as labels.

References

https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggcoefstats.html

Examples

# \donttest{ # for reproducibility set.seed(123) # -------------- with model object -------------------------------------- # model object mod <- lm(formula = mpg ~ cyl * am, data = mtcars) # to get a plot ggstatsplot::ggcoefstats(x = mod, output = "plot")
# to get a tidy dataframe ggstatsplot::ggcoefstats(x = mod, output = "tidy")
#> # A tibble: 3 x 11 #> term estimate conf.low conf.high std.error statistic p.value significance #> <fct> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr> #> 1 cyl -1.98 -2.89 -1.06 0.449 -4.40 0.000141 *** #> 2 am 10.2 1.36 19.0 4.30 2.36 0.0253 * #> 3 cyl:am -1.31 -2.75 0.143 0.707 -1.85 0.0755 ns #> p.value.formatted df.residual #> <chr> <int> #> 1 <= 0.001 28 #> 2 == 0.025 28 #> 3 == 0.076 28 #> label #> <chr> #> 1 list(~italic(beta)==-1.98, ~italic(t)(28)==-4.40, ~italic(p)<= 0.001) #> 2 list(~italic(beta)==10.18, ~italic(t)(28)==2.36, ~italic(p)== 0.025) #> 3 list(~italic(beta)==-1.31, ~italic(t)(28)==-1.85, ~italic(p)== 0.076)
# to get a glance summary ggstatsplot::ggcoefstats(x = mod, output = "glance")
#> # A tibble: 1 x 12 #> r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 0.785 0.762 2.94 34.1 0.00000000173 3 -77.8 166. 173. #> deviance df.residual nobs #> <dbl> <int> <int> #> 1 242. 28 32
# to get augmented dataframe ggstatsplot::ggcoefstats(x = mod, output = "augment")
#> # A tibble: 32 x 10 #> .rownames mpg cyl am .fitted .resid .std.resid .hat .sigma #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Mazda RX4 21 6 1 21.4 0.364 -0.131 0.106 2.99 #> 2 Mazda RX4 Wag 21 6 1 21.4 0.364 -0.131 0.106 2.99 #> 3 Datsun 710 22.8 4 1 27.9 5.13 -1.86 0.117 2.80 #> 4 Hornet 4 Drive 21.4 6 0 19.0 -2.38 0.842 0.0735 2.96 #> 5 Hornet Sportabout 18.7 8 0 15.1 -3.63 1.29 0.0784 2.90 #> 6 Valiant 18.1 6 0 19.0 0.919 -0.325 0.0735 2.99 #> 7 Duster 360 14.3 8 0 15.1 0.768 -0.272 0.0784 2.99 #> 8 Merc 240D 24.4 4 0 23.0 -1.43 0.563 0.255 2.98 #> 9 Merc 230 22.8 4 0 23.0 0.171 -0.0672 0.255 2.99 #> 10 Merc 280 19.2 6 0 19.0 -0.181 0.0639 0.0735 2.99 #> .cooksd #> <dbl> #> 1 0.000510 #> 2 0.000510 #> 3 0.114 #> 4 0.0141 #> 5 0.0353 #> 6 0.00209 #> 7 0.00157 #> 8 0.0271 #> 9 0.000387 #> 10 0.0000811 #> # ... with 22 more rows
# -------------- with custom dataframe ----------------------------------- # creating a dataframe df <- structure( list( term = structure( c(3L, 4L, 1L, 2L, 5L), .Label = c( "Africa", "Americas", "Asia", "Europe", "Oceania" ), class = "factor" ), estimate = c( 0.382047603321706, 0.780783111514665, 0.425607573765058, 0.558365541235078, 0.956473848429961 ), std.error = c( 0.0465576338644502, 0.0330218199731529, 0.0362834986178494, 0.0480571500648261, 0.062215818388157 ), statistic = c( 8.20590677855356, 23.6444603038067, 11.7300588415607, 11.6187818146078, 15.3734833553524 ), conf.low = c( 0.290515146096969, 0.715841986960399, 0.354354575031406, 0.46379116008131, 0.827446138277154 ), conf.high = c( 0.473580060546444, 0.845724236068931, 0.496860572498711, 0.652939922388847, 1.08550155858277 ), p.value = c( 3.28679518728519e-15, 4.04778497135963e-75, 7.59757330804449e-29, 5.45155840151592e-26, 2.99171217913312e-13 ), df.residual = c( 394L, 358L, 622L, 298L, 22L ) ), row.names = c(NA, -5L), class = c( "tbl_df", "tbl", "data.frame" ) ) # plotting the dataframe ggstatsplot::ggcoefstats( x = df, statistic = "t", meta.analytic.effect = TRUE, k = 3 )
#> Warning: There were 9 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See #> http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> Warning: Examine the pairs() plot to diagnose sampling problems
# -------------- getting model summary ------------------------------ # model library(lme4)
#> Loading required package: Matrix
lmm1 <- lme4::lmer( formula = Reaction ~ Days + (Days | Subject), data = sleepstudy ) # dataframe with model summary ggstatsplot::ggcoefstats(x = lmm1, output = "glance")
#> # A tibble: 1 x 6 #> sigma logLik AIC BIC REMLcrit df.residual #> <dbl> <dbl> <dbl> <dbl> <dbl> <int> #> 1 25.6 -872. 1756. 1775. 1744. 174
# -------------- getting augmented dataframe ------------------------------ # setup set.seed(123) library(survival) # fit cfit <- survival::coxph(formula = Surv(time, status) ~ age + sex, data = lung) # augmented dataframe ggstatsplot::ggcoefstats( x = cfit, data = lung, output = "augment", type.predict = "risk" )
#> # A tibble: 228 x 13 #> inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 3 306 2 74 1 1 90 100 1175 NA #> 2 3 455 2 68 1 0 90 90 1225 15 #> 3 3 1010 1 56 1 0 90 90 NA 15 #> 4 5 210 2 57 1 1 90 60 1150 11 #> 5 1 883 2 60 1 0 100 90 NA 0 #> 6 12 1022 1 74 1 1 50 80 513 0 #> 7 7 310 2 68 2 2 70 60 384 10 #> 8 11 361 2 71 2 2 60 80 538 1 #> 9 1 218 2 53 1 1 70 80 825 16 #> 10 7 166 2 61 1 2 70 70 271 34 #> .fitted .se.fit .resid #> <dbl> <dbl> <dbl> #> 1 1.49 0.149 0.00439 #> 2 1.35 0.0944 -0.506 #> 3 1.10 0.0956 -3.13 #> 4 1.12 0.0900 0.533 #> 5 1.17 0.0770 -2.35 #> 6 1.49 0.149 -4.25 #> 7 0.806 0.104 0.448 #> 8 0.848 0.121 0.290 #> 9 1.04 0.115 0.548 #> 10 1.19 0.0745 0.681 #> # ... with 218 more rows
# }