Course 1 — #courses
Note
Inference labs use the five-step template: Hypothesis → Visualise → Assumptions → Conduct → Conclude.
Labs 3.4, 4.1.
Non-parametric tests make weaker distributional assumptions than the parametric tests that dominate the preceding labs. They tend to test exchangeability of distributions rather than equality of means, and they operate on ranks of the data rather than on the raw values. When the data are heavily skewed, ordinal, or small-sample, the non-parametric alternatives are often the defensible default.
The workhorse set: Wilcoxon signed-rank for paired continuous data; Mann-Whitney U (the wilcox.test with paired = FALSE) for two independent groups; Kruskal-Wallis for three or more groups; sign test as a cruder paired-data option based on the signs of differences alone.
A common mistake is to read these tests as “comparing medians”. They compare distributions. Under additional assumptions — notably that the two groups’ distributions differ only by a location shift — the Mann-Whitney U test does estimate a location shift, but that is a stronger assumption than the test itself requires.
Three scenarios:
A. Paired: pre/post in 25 patients, skewed outcome. H0: no shift. B. Independent two-sample: outcome in two arms, skewed. H0: same distribution. C. Three-group comparison: outcome across three centres. H0: same distribution across all three.
# Scenario A: paired, lognormal-ish
n_a <- 25
pre <- exp(rnorm(n_a, 3, 0.3))
post <- pre * exp(rnorm(n_a, -0.2, 0.3))
df_a <- tibble(id = seq_len(n_a), pre, post,
delta = post - pre)
# Scenario B: independent two groups, lognormal
n_b <- 30
grp <- rep(c("A", "B"), each = n_b)
y_b <- c(exp(rnorm(n_b, 3, 0.5)),
exp(rnorm(n_b, 3.4, 0.5)))
df_b <- tibble(grp, y = y_b)
# Scenario C: three groups
n_c <- 25
centre <- rep(c("X", "Y", "Z"), each = n_c)
y_c <- c(exp(rnorm(n_c, 3.0, 0.4)),
exp(rnorm(n_c, 3.2, 0.4)),
exp(rnorm(n_c, 3.5, 0.4)))
df_c <- tibble(centre, y = y_c)Wilcoxon signed-rank: paired observations; differences symmetric around the null. Mann-Whitney U: independent observations within and between groups; if you want a location-shift interpretation, the two distributions should have the same shape. Kruskal-Wallis: independent observations; same-shape assumption extends to all k groups.
# A: Wilcoxon signed-rank
wt_a <- wilcox.test(df_a$post, df_a$pre, paired = TRUE,
conf.int = TRUE)
# Sign test: a simple binomial test on positive differences
signs <- sum(df_a$delta > 0)
sign_test <- binom.test(signs, n_a, p = 0.5)
# B: Mann-Whitney U
wt_b <- wilcox.test(y ~ grp, data = df_b, conf.int = TRUE)
# C: Kruskal-Wallis
kw_c <- kruskal.test(y ~ centre, data = df_c)
list(wilcox_paired = wt_a,
sign_test = sign_test,
mann_whitney = wt_b,
kruskal_wallis = kw_c)$wilcox_paired
Wilcoxon signed rank exact test
data: df_a$post and df_a$pre
V = 29, p-value = 0.000103
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
-7.462659 -2.298395
sample estimates:
(pseudo)median
-4.709926
$sign_test
Exact binomial test
data: signs and n_a
number of successes = 4, number of trials = 25, p-value = 0.0009105
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.04537945 0.36082845
sample estimates:
probability of success
0.16
$mann_whitney
Wilcoxon rank sum exact test
data: y by grp
W = 239, p-value = 0.001525
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
-17.224482 -3.592832
sample estimates:
difference in location
-9.54957
$kruskal_wallis
Kruskal-Wallis rank sum test
data: y by centre
Kruskal-Wallis chi-squared = 27.751, df = 2, p-value = 9.417e-07
Follow-up pairwise comparisons with Holm correction for C:
Paired comparison (A). 25 patients, median change = -4.1; Wilcoxon signed-rank p = 10^{-4}, 95% CI on the Hodges-Lehmann estimate -7.46 to -2.3. Sign test p = 9.1^{-4}.
Two-group (B). Mann-Whitney p = 0.0015, 95% CI -17.22 to -3.59.
Three-group (C). Kruskal-Wallis χ² = 27.75, df = 2, p = 9.4^{-7}. Pairwise Wilcoxon tests with Holm correction indicated the signal lay in the X vs Z contrast.
Non-parametric tests are not free lunches. They trade a modest amount of power (about 5% vs a t-test when the t-test’s assumptions hold) for robustness to distributional misspecification.
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.5 forcats_1.0.1 stringr_1.6.0 dplyr_1.2.1
[5] purrr_1.2.2 readr_2.2.0 tidyr_1.3.2 tibble_3.3.1
[9] ggplot2_4.0.3 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.4.1 tidyselect_1.2.1
[5] scales_1.4.0 yaml_2.3.12 fastmap_1.2.0 R6_2.6.1
[9] labeling_0.4.3 generics_0.1.4 knitr_1.51 htmlwidgets_1.6.4
[13] pillar_1.11.1 RColorBrewer_1.1-3 tzdb_0.5.0 rlang_1.2.0
[17] stringi_1.8.7 xfun_0.57 S7_0.2.2 otel_0.2.0
[21] timechange_0.4.0 cli_3.6.6 withr_3.0.2 magrittr_2.0.5
[25] digest_0.6.39 grid_4.4.1 hms_1.1.4 lifecycle_1.0.5
[29] vctrs_0.7.3 evaluate_1.0.5 glue_1.8.1 farver_2.1.2
[33] rmarkdown_2.31 tools_4.4.1 pkgconfig_2.0.3 htmltools_0.5.9