Making expression for Mann-Whitney U-test/Wilcoxon test results

  y, = NULL,
  paired = FALSE,
  k = 2L,
  conf.level = 0.95,
  conf.type = "norm",
  nboot = 100,
  output = "expression",



A dataframe (or a tibble) from which variables specified are to be taken. A matrix or tables will not be accepted.


The grouping variable from the dataframe data.


The response (a.k.a. outcome or dependent) variable from the dataframe data.

In case of repeated measures design (paired = TRUE, i.e.), this argument specifies the subject or repeated measures id. Note that if this argument is NULL (which is the default), the function assumes that the data has already been sorted by such an id by the user and creates an internal identifier. So if your data is not sorted and you leave this argument unspecified, the results can be inaccurate.


Logical that decides whether the experimental design is repeated measures/within-subjects or between-subjects. The default is FALSE.


Number of digits after decimal point (should be an integer) (Default: k = 2L).


Scalar between 0 and 1. If unspecified, the defaults return 95% lower and upper confidence intervals (0.95).


A vector of character strings representing the type of intervals required. The value should be any subset of the values "norm", "basic", "perc", "bca". For more, see ?


Number of bootstrap samples for computing confidence interval for the effect size (Default: 100).


If "expression", will return expression with statistical details, while "dataframe" will return a dataframe containing the results.


Additional arguments (currently ignored).


For the two independent samples case, the Mann-Whitney U-test is calculated and W is reported from stats::wilcox.test. For the paired samples case the Wilcoxon signed rank test is run and V is reported.

Since there is no single commonly accepted method for reporting effect size for these tests we are computing and reporting r (computed as \(Z/\sqrt{N}\)) along with the confidence intervals associated with the estimate. Note that N here corresponds to total sample size for independent/between-subjects designs, and to total number of pairs (and not observations) for repeated measures/within-subjects designs.

Note: The stats::wilcox.test function does not follow the same convention as stats::t.test. The sign of the V test statistic will always be positive since it is the sum of the positive signed ranks. Therefore, V will vary in magnitude but not significance based solely on the order of the grouping variable. Consider manually reordering your factor levels if appropriate as shown in the second example below.


For more details, see-


# for reproducibility set.seed(123) library(statsExpressions) # -------------- between-subjects design ------------------------ expr_t_nonparametric( data = sleep, x = group, y = extra )
#> paste("log"["e"](italic("W")["Mann-Whitney"]), " = ", "3.24", #> ", ", italic("p"), " = ", "0.069", ", ", widehat(italic("r")), #> " = ", "-0.41", ", CI"["95%"], " [", "-0.84", ", ", "-0.04", #> "]", ", ", italic("n")["obs"], " = ", 20L)
# -------------- within-subjects design ------------------------ expr_t_nonparametric( data = VR_dilemma, x = modality, y = score, paired = TRUE, = id )
#> paste("log"["e"](italic("V")["Wilcoxon"]), " = ", "1.50", ", ", #> italic("p"), " = ", "0.001", ", ", widehat(italic("r")), #> " = ", "-0.57", ", CI"["95%"], " [", "-0.67", ", ", "-0.44", #> "]", ", ", italic("n")["pairs"], " = ", 34L)