Course 3 — #courses
Note
Inference lab using the five-step template: Hypothesis → Visualise → Assumptions → Conduct → Conclude.
pwr.simr.Course 1 inference and Course 2 mixed-model basics (we will fit a small lmer).
Power is the probability of rejecting the null hypothesis when a specific alternative is true. Four inputs set it: the effect size you care about, the variability of the outcome, the significance level, and the sample size (with the design). Closed-form formulas exist for standard one- and two-sample tests and for some simple regression settings. For anything more elaborate — clustered data, non-linear models, interim looks, complex survival designs — you simulate.
Simulation-based power is conceptually straightforward: for a given scenario, generate many datasets under the alternative, run the planned analysis on each, and compute the fraction of times you reject the null. The simr package does this for models fit with lme4, allowing you to vary the sample size or the effect and trace out a power curve.
A useful rule of thumb: if your closed-form calculation is for a model simpler than the one you will actually fit, the closed-form is giving you a lower bound on the sample size you need, not an answer.
Detect a standardised mean difference of d = 0.4 between two groups with 80% power at a 5% two-sided significance level. Then power a mixed model with 20 clusters and 10 observations each to detect a fixed effect of 0.3.
ns <- seq(20, 400, by = 20)
pw <- map_dbl(ns, \(n) pwr.t.test(
n = n / 2, d = 0.4, sig.level = 0.05, type = "two.sample"
)$power)
tibble(n = ns, power = pw) |>
ggplot(aes(n, power)) +
geom_line() +
geom_hline(yintercept = 0.8, linetype = "dashed", colour = "grey40") +
labs(x = "Total sample size", y = "Power")Closed-form assumes equal variance across arms, independent observations, and the nominal effect size. The simulation below adds random intercepts at the cluster level and therefore relaxes independence.
Two-sample t test power calculation
n = 99.08032
d = 0.4
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
# Simulation: a small two-arm cluster study
J <- 20 # clusters
nj <- 10 # obs per cluster
dat <- tibble(
cluster = factor(rep(seq_len(J), each = nj)),
arm = rep(rep(c(0, 1), each = nj), length.out = J * nj),
y = rnorm(J * nj) # placeholder; simr overwrites
)
m0 <- makeLmer(
y ~ arm + (1 | cluster),
fixef = c(0, 0.3),
VarCorr = 0.5,
sigma = 1.0,
data = dat
)
m0Linear mixed model fit by REML ['lmerMod']
Formula: y ~ arm + (1 | cluster)
Data: dat
REML criterion at convergence: 576.9954
Random effects:
Groups Name Std.Dev.
cluster (Intercept) 0.7071
Residual 1.0000
Number of obs: 200, groups: cluster, 20
Fixed Effects:
(Intercept) arm
0.0 0.3
Power for predictor 'arm', (95% confidence interval):
16.00% ( 7.17, 29.11)
Test: Kenward Roger (package pbkrtest)
Effect size for arm is 0.30
Based on 50 simulations, (0 errors, 0 warnings, 0 messages)
Time elapsed: 0 h 0 m 4 s
Fifty simulations is enough for a teaching demo; in practice use at least 1000 for a stable estimate.
A two-sample t-test targeting d = 0.4 at 80% power requires about 200 participants in total. A mixed model with 20 clusters of size 10 and a fixed effect of 0.3 has estimated power of 16% in this simulation (50 replicates).
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] simr_1.0.9 lmerTest_3.2-1 lme4_2.0-1 Matrix_1.7-0
[5] pwr_1.3-0 lubridate_1.9.5 forcats_1.0.1 stringr_1.6.0
[9] dplyr_1.2.1 purrr_1.2.2 readr_2.2.0 tidyr_1.3.2
[13] tibble_3.3.1 ggplot2_4.0.3 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] gtable_0.3.6 xfun_0.57 htmlwidgets_1.6.4
[4] lattice_0.22-6 tzdb_0.5.0 numDeriv_2016.8-1.1
[7] vctrs_0.7.3 tools_4.4.1 Rdpack_2.6.6
[10] generics_0.1.4 parallel_4.4.1 pbkrtest_0.5.5
[13] pkgconfig_2.0.3 RColorBrewer_1.1-3 S7_0.2.2
[16] lifecycle_1.0.5 binom_1.1-1.1 compiler_4.4.1
[19] farver_2.1.2 carData_3.0-6 htmltools_0.5.9
[22] yaml_2.3.12 Formula_1.2-5 pillar_1.11.1
[25] car_3.1-5 nloptr_2.2.1 MASS_7.3-60.2
[28] reformulas_0.4.4 iterators_1.0.14 boot_1.3-30
[31] abind_1.4-8 nlme_3.1-164 tidyselect_1.2.1
[34] digest_0.6.39 stringi_1.8.7 labeling_0.4.3
[37] splines_4.4.1 fastmap_1.2.0 grid_4.4.1
[40] RLRsim_3.1-9 cli_3.6.6 magrittr_2.0.5
[43] broom_1.0.12 withr_3.0.2 backports_1.5.1
[46] scales_1.4.0 plotrix_3.8-14 timechange_0.4.0
[49] rmarkdown_2.31 otel_0.2.0 hms_1.1.4
[52] evaluate_1.0.5 knitr_1.51 rbibutils_2.4.1
[55] mgcv_1.9-1 rlang_1.2.0 Rcpp_1.1.1-1.1
[58] glue_1.8.1 minqa_1.2.8 jsonlite_2.0.0
[61] plyr_1.8.9 R6_2.6.1