Course 3 — #courses
Note
Workflow lab: Goal → Approach → Execution → Check → Report.
Course 3 to date.
A pre-registration is a timestamped public record of a study’s hypotheses, design, and primary analysis. A statistical analysis plan (SAP) is a more detailed, often confidential companion document that specifies exactly how the data will be handled — variable definitions, handling of missing data, subgroup analyses, sensitivity analyses, and reporting conventions. Between them, these two documents make the difference between “we planned this” and “we planned this, we can prove it, and here is the file to show when it was signed.”
Flexibility during analysis — researcher degrees of freedom — is not fraud. It is usually well-meaning curiosity. But aggregated across a field it produces a literature of spurious findings, and for any one study it produces an analysis that will not replicate. A pre-registration does not forbid curiosity. It separates confirmatory analyses (specified in advance) from exploratory ones, and requires each to be labelled when reported.
Produce a minimal-but-complete pre-registration template for a fictional two-arm randomised trial, plus a SAP skeleton.
A pre-registration has four mandatory elements:
A SAP adds:
# Pre-registration — Trial X
Protocol version 1.0 | Date 2026-04-18 | PI [NAME]
## 1. Research question
Primary: Does [intervention] reduce [outcome] relative to [control]
in [population] over [time frame]?
## 2. Hypotheses
H0: difference in [outcome] = 0.
H1: difference in [outcome] != 0. Two-sided, alpha = 0.05.
## 3. Design
Two-arm, parallel-group, double-blind, placebo-controlled trial.
1:1 allocation, block randomisation (block size 4), stratified by site.
## 4. Primary analysis
ITT. Linear regression of outcome at follow-up on arm, adjusted for
baseline value and stratification factors. Primary estimand:
adjusted mean difference with 95% CI.
## 5. Sample size
Target n = 200 per arm; power = 0.80 for d = 0.3, alpha = 0.05.
## 6. Data handling
- Missing data: multiple imputation under MAR (m = 20).
- Outliers: retained in primary analysis.
- Adherence: per-protocol analysis as sensitivity.Three questions to ask of any pre-registration before posting:
A pre-registration for Trial X was deposited on the OSF on [date] (DOI [doi]). The primary analysis, statistical model, and sample-size justification are specified therein. Any deviation from the pre-registered plan is reported in the Deviations section of the manuscript with its rationale.
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.5 forcats_1.0.1 stringr_1.6.0 dplyr_1.2.1
[5] purrr_1.2.2 readr_2.2.0 tidyr_1.3.2 tibble_3.3.1
[9] ggplot2_4.0.3 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.4.1 tidyselect_1.2.1
[5] scales_1.4.0 yaml_2.3.12 fastmap_1.2.0 R6_2.6.1
[9] generics_0.1.4 knitr_1.51 htmlwidgets_1.6.4 pillar_1.11.1
[13] RColorBrewer_1.1-3 tzdb_0.5.0 rlang_1.2.0 stringi_1.8.7
[17] xfun_0.57 S7_0.2.2 otel_0.2.0 timechange_0.4.0
[21] cli_3.6.6 withr_3.0.2 magrittr_2.0.5 digest_0.6.39
[25] grid_4.4.1 hms_1.1.4 lifecycle_1.0.5 vctrs_0.7.3
[29] evaluate_1.0.5 glue_1.8.1 farver_2.1.2 rmarkdown_2.31
[33] tools_4.4.1 pkgconfig_2.0.3 htmltools_0.5.9