Week 4, Session 5 — Explanation vs prediction; reporting

Course 2 — #courses

Author

R. Heller

Note

Workflow lab: Goal → Approach → Execution → Check → Report.

Learning objectives

Distinguish explanatory from predictive modelling, and choose the correct evaluation metric for each.
Map STROBE, TRIPOD, STARD, and CONSORT onto the study designs each is meant for.
Produce a publication-ready regression table with gtsummary whose numbers trace directly to the fitted model.

Prerequisites

The regression tools covered across Course 2.

Background

Breiman’s 1998 essay The Two Cultures and Shmueli’s 2010 essay To Explain or to Predict? draw the same line from opposite sides. In explanatory modelling the goal is inference — what is the estimated effect of X on Y, adjusted for confounders? — and the right evaluation metric is bias-unbiasedness, interval coverage, and interpretability. In predictive modelling the goal is accuracy on unseen data, and the right evaluation metric is out-of-sample loss, calibration, and decision utility. The statistical machinery overlaps but the habits around it do not; a regression that makes an excellent explanation often makes a lacklustre prediction, and vice versa.

Reporting guidelines are the mechanism by which the field enforces discipline around these two cultures. STROBE for observational studies, CONSORT for randomised trials, STARD for diagnostic accuracy, and TRIPOD (now TRIPOD-AI) for prediction models each specify the items a reader needs to evaluate the claim. A checklist filled in as you write is much easier than one filled in at submission.

Setup

library(tidyverse)
library(gtsummary)
library(broom)
library(palmerpenguins)
set.seed(42)
theme_set(theme_minimal(base_size = 12))

1. Goal

Fit a single linear model to the penguins data and present it two ways: once as an explanatory analysis (what drives body mass?) and once as a predictive model (can we predict body mass on held-out birds?).

2. Approach

peng <- penguins |> drop_na(body_mass_g, flipper_length_mm, sex, species)
fit_expl <- lm(body_mass_g ~ flipper_length_mm + sex + species, data = peng)

3. Execution

peng |>
  ggplot(aes(flipper_length_mm, body_mass_g, colour = species)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 0.8) +
  labs(x = "Flipper length (mm)", y = "Body mass (g)")

4. Check

Explanatory evaluation: effect sizes with intervals, residual diagnostics, and adjusted R².

glance(fit_expl) |> select(r.squared, adj.r.squared, sigma, df, df.residual)

# A tibble: 1 × 5
  r.squared adj.r.squared sigma    df df.residual
      <dbl>         <dbl> <dbl> <dbl>       <int>
1     0.867         0.865  296.     4         328

Predictive evaluation: a 5-fold split, hand-coded to keep the example transparent.

k <- 5
folds <- sample(rep(seq_len(k), length.out = nrow(peng)))
rmse_k <- sapply(seq_len(k), function(i) {
  tr <- peng[folds != i, ]
  te <- peng[folds == i, ]
  f  <- lm(body_mass_g ~ flipper_length_mm + sex + species, data = tr)
  sqrt(mean((predict(f, te) - te$body_mass_g)^2))
})
mean_rmse <- mean(rmse_k)
mean_rmse

[1] 294.9655

5. Report

tbl_regression(fit_expl, intercept = TRUE) |>
  modify_caption("**Table 1. Linear-regression estimates for body mass (g).**")

**Table 1. Linear-regression estimates for body mass (g).**
Characteristic	Beta	95% CI	p-value
(Intercept)	-366	-1,412, 681	0.5
flipper_length_mm	20	14, 26	<0.001
sex
female	—	—
male	530	456, 605	<0.001
species
Adelie	—	—
Chinstrap	-88	-179, 3.5	0.060
Gentoo	836	669, 1,004	<0.001
Abbreviation: CI = Confidence Interval

In the Palmer penguins dataset (n = 333), body mass was associated with flipper length, sex, and species (adjusted R² = 0.87). Out-of-sample performance, estimated by 5-fold cross-validation, was RMSE = 295 g.

Reporting-guideline map

Design	Guideline	URL
Randomised trial	CONSORT	https://www.consort-statement.org/
Observational study	STROBE	https://www.strobe-statement.org/
Diagnostic-accuracy study	STARD	https://www.equator-network.org/reporting-guidelines/stard/
Prediction-model study	TRIPOD / TRIPOD-AI	https://www.tripod-statement.org/
Systematic review	PRISMA	http://prisma-statement.org/

The distinction to impress on a PhD audience: an effect size in an explanatory model is an answer to a scientific question; in a predictive model it is an implementation detail.

Common pitfalls

Choosing a predictor set by R² and reporting a predictive claim, or tuning a predictive model and reporting causal-sounding coefficients.
Filling in a reporting checklist at submission rather than while drafting.
Assuming in-sample R² generalises.

Session info

sessionInfo()

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] palmerpenguins_0.1.1 broom_1.0.12         gtsummary_2.5.0     
 [4] lubridate_1.9.5      forcats_1.0.1        stringr_1.6.0       
 [7] dplyr_1.2.1          purrr_1.2.2          readr_2.2.0         
[10] tidyr_1.3.2          tibble_3.3.1         ggplot2_4.0.3       
[13] tidyverse_2.0.0     

loaded via a namespace (and not attached):
 [1] gt_1.3.0             sass_0.4.10          generics_0.1.4      
 [4] xml2_1.5.2           stringi_1.8.7        lattice_0.22-6      
 [7] hms_1.1.4            digest_0.6.39        magrittr_2.0.5      
[10] evaluate_1.0.5       grid_4.4.1           timechange_0.4.0    
[13] RColorBrewer_1.1-3   cards_0.7.1          fastmap_1.2.0       
[16] broom.helpers_1.22.0 jsonlite_2.0.0       Matrix_1.7-0        
[19] backports_1.5.1      mgcv_1.9-1           scales_1.4.0        
[22] labelled_2.16.0      cli_3.6.6            rlang_1.2.0         
[25] litedown_0.9         commonmark_2.0.0     splines_4.4.1       
[28] base64enc_0.1-6      withr_3.0.2          yaml_2.3.12         
[31] otel_0.2.0           tools_4.4.1          tzdb_0.5.0          
[34] vctrs_0.7.3          R6_2.6.1             lifecycle_1.0.5     
[37] fs_2.1.0             htmlwidgets_1.6.4    pkgconfig_2.0.3     
[40] pillar_1.11.1        gtable_0.3.6         glue_1.8.1          
[43] haven_2.5.5          xfun_0.57            tidyselect_1.2.1    
[46] knitr_1.51           farver_2.1.2         htmltools_0.5.9     
[49] nlme_3.1-164         rmarkdown_2.31       labeling_0.4.3      
[52] compiler_4.4.1       S7_0.2.2             markdown_2.0

Learning objectives

Prerequisites

Background

Setup

1. Goal

2. Approach

3. Execution

4. Check

5. Report

Reporting-guideline map

Common pitfalls

Further reading

Session info

Related labs