Week 1, Session 3 — Adaptive, non-inferiority, equivalence trials

Course 3 — #courses

Author

R. Heller

Note

Inference lab using the five-step template: Hypothesis → Visualise → Assumptions → Conduct → Conclude.

Learning objectives

  • Distinguish superiority, non-inferiority, and equivalence trials by their null hypotheses.
  • Choose a non-inferiority margin and say what it means clinically.
  • Run and plot a non-inferiority test with a confidence interval versus the margin.

Prerequisites

Course 1 inference and Course 2 confidence intervals.

Background

A superiority trial asks whether a new treatment beats the comparator. A non-inferiority (NI) trial asks whether the new treatment is not worse than the comparator by more than a pre-specified margin (often called Δ). An equivalence trial asks whether the new treatment is neither better nor worse than the comparator by more than Δ on either side. These are distinct statistical questions and they require different designs, analyses, and sample sizes.

The NI margin is the hardest thing about an NI trial. It has to be small enough that clinicians and patients would trade it for the benefits of the new treatment (lower cost, fewer side effects, easier administration), but not so small that the trial is infeasible. Regulators will often ask for a margin no larger than a defined fraction of the historical treatment effect of the comparator versus placebo.

Adaptive trials modify some aspect of the design (sample size, arm allocation, the decision rule) during the trial based on interim results, using pre-specified rules. Group sequential designs, for example, stop early for overwhelming efficacy or for futility, while spending Type I error according to a chosen alpha-spending function.

A one-sided 97.5% confidence interval that stays on the correct side of the margin is the operational test in most NI trials. Graphically, this is a forest plot where you check that the interval does not cross the margin line.

Setup

library(tidyverse)
set.seed(42)
theme_set(theme_minimal(base_size = 12))

1. Hypothesis

H0: the new treatment is worse than the comparator by more than Δ = 3 units on a continuous outcome. H1: the new treatment is not worse by more than Δ.

2. Visualise

n <- 150
trial <- tibble(
  arm = rep(c("comparator", "new"), each = n),
  y   = c(rnorm(n, 70, 10), rnorm(n, 69, 10))
)

trial |>
  ggplot(aes(arm, y, fill = arm)) +
  geom_boxplot(alpha = 0.6, colour = "grey30") +
  labs(x = NULL, y = "Outcome") +
  theme(legend.position = "none")

3. Assumptions

Independent observations, roughly normal within-arm residuals, and a pre-specified margin declared before looking at the data. The direction of the test matters: we test whether the lower bound of the mean difference (new − comparator) is above −Δ.

4. Conduct

fit <- t.test(y ~ arm, data = trial)
fit

    Welch Two Sample t-test

data:  y by arm
t = 0.75382, df = 297.66, p-value = 0.4516
alternative hypothesis: true difference in means between group comparator and group new is not equal to 0
95 percent confidence interval:
 -1.386395  3.107936
sample estimates:
mean in group comparator        mean in group new 
                69.71260                 68.85182 
delta <- 3
ci <- fit$conf.int
est <- diff(rev(fit$estimate))  # new - comparator
ni_pass <- ci[1] > -delta
tibble(estimate = est, low = ci[1], high = ci[2], margin = -delta, ni_pass)
# A tibble: 1 × 5
  estimate   low  high margin ni_pass
     <dbl> <dbl> <dbl>  <dbl> <lgl>  
1    0.861 -1.39  3.11     -3 TRUE   
tibble(estimate = est, low = ci[1], high = ci[2]) |>
  ggplot(aes(x = estimate, y = 1)) +
  geom_point(size = 3) +
  geom_errorbarh(aes(xmin = low, xmax = high), height = 0.1) +
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey40") +
  geom_vline(xintercept = -delta, colour = "firebrick") +
  labs(x = "Mean difference (new - comparator)", y = NULL) +
  theme(axis.text.y = element_blank())

5. Concluding statement

The new treatment had a mean difference of 0.86 versus comparator (95% CI: -1.39 to 3.11). With a pre-specified non-inferiority margin of Δ = 3, the new treatment met the non-inferiority criterion because the lower bound exceeded −Δ.

A word on adaptive designs

Adaptive designs work when the adaptation rules and the alpha spent at each look are pre-specified. Running an interim analysis with the hope of extending the trial if it looks promising — without a pre- specified rule — inflates Type I error and destroys the trial’s inferential warranty.

A common exam trap is to confuse “fails to reject superiority” with “establishes non-inferiority”. They are not the same thing.

Common pitfalls

  • Picking the NI margin after seeing the data.
  • Using a two-sided superiority test and claiming non-inferiority.
  • Running an “interim look” without a pre-specified spending rule.
  • Treating equivalence and non-inferiority as interchangeable.

Further reading

  • Piaggio G et al. (2012), CONSORT extension for non-inferiority and equivalence trials.
  • Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials.
  • FDA (2016), Non-Inferiority Clinical Trials to Establish Effectiveness.

Session info

sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.9.5 forcats_1.0.1   stringr_1.6.0   dplyr_1.2.1    
 [5] purrr_1.2.2     readr_2.2.0     tidyr_1.3.2     tibble_3.3.1   
 [9] ggplot2_4.0.3   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] gtable_0.3.6       jsonlite_2.0.0     compiler_4.4.1     tidyselect_1.2.1  
 [5] scales_1.4.0       yaml_2.3.12        fastmap_1.2.0      R6_2.6.1          
 [9] labeling_0.4.3     generics_0.1.4     knitr_1.51         htmlwidgets_1.6.4 
[13] pillar_1.11.1      RColorBrewer_1.1-3 tzdb_0.5.0         rlang_1.2.0       
[17] stringi_1.8.7      xfun_0.57          S7_0.2.2           otel_0.2.0        
[21] timechange_0.4.0   cli_3.6.6          withr_3.0.2        magrittr_2.0.5    
[25] digest_0.6.39      grid_4.4.1         hms_1.1.4          lifecycle_1.0.5   
[29] vctrs_0.7.3        evaluate_1.0.5     glue_1.8.1         farver_2.1.2      
[33] rmarkdown_2.31     tools_4.4.1        pkgconfig_2.0.3    htmltools_0.5.9