Week 4, Session 5 — Explanation vs prediction; reporting

Course 2 — #courses

Author

R. Heller

Note

Workflow lab: Goal → Approach → Execution → Check → Report.

Learning objectives

  • Distinguish explanatory from predictive modelling, and choose the correct evaluation metric for each.
  • Map STROBE, TRIPOD, STARD, and CONSORT onto the study designs each is meant for.
  • Produce a publication-ready regression table with gtsummary whose numbers trace directly to the fitted model.

Prerequisites

The regression tools covered across Course 2.

Background

Breiman’s 1998 essay The Two Cultures and Shmueli’s 2010 essay To Explain or to Predict? draw the same line from opposite sides. In explanatory modelling the goal is inference — what is the estimated effect of X on Y, adjusted for confounders? — and the right evaluation metric is bias-unbiasedness, interval coverage, and interpretability. In predictive modelling the goal is accuracy on unseen data, and the right evaluation metric is out-of-sample loss, calibration, and decision utility. The statistical machinery overlaps but the habits around it do not; a regression that makes an excellent explanation often makes a lacklustre prediction, and vice versa.

Reporting guidelines are the mechanism by which the field enforces discipline around these two cultures. STROBE for observational studies, CONSORT for randomised trials, STARD for diagnostic accuracy, and TRIPOD (now TRIPOD-AI) for prediction models each specify the items a reader needs to evaluate the claim. A checklist filled in as you write is much easier than one filled in at submission.

Setup

library(tidyverse)
library(gtsummary)
library(broom)
library(palmerpenguins)
set.seed(42)
theme_set(theme_minimal(base_size = 12))

1. Goal

Fit a single linear model to the penguins data and present it two ways: once as an explanatory analysis (what drives body mass?) and once as a predictive model (can we predict body mass on held-out birds?).

2. Approach

peng <- penguins |> drop_na(body_mass_g, flipper_length_mm, sex, species)
fit_expl <- lm(body_mass_g ~ flipper_length_mm + sex + species, data = peng)

3. Execution

peng |>
  ggplot(aes(flipper_length_mm, body_mass_g, colour = species)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 0.8) +
  labs(x = "Flipper length (mm)", y = "Body mass (g)")

4. Check

Explanatory evaluation: effect sizes with intervals, residual diagnostics, and adjusted R².

glance(fit_expl) |> select(r.squared, adj.r.squared, sigma, df, df.residual)
# A tibble: 1 × 5
  r.squared adj.r.squared sigma    df df.residual
      <dbl>         <dbl> <dbl> <dbl>       <int>
1     0.867         0.865  296.     4         328

Predictive evaluation: a 5-fold split, hand-coded to keep the example transparent.

k <- 5
folds <- sample(rep(seq_len(k), length.out = nrow(peng)))
rmse_k <- sapply(seq_len(k), function(i) {
  tr <- peng[folds != i, ]
  te <- peng[folds == i, ]
  f  <- lm(body_mass_g ~ flipper_length_mm + sex + species, data = tr)
  sqrt(mean((predict(f, te) - te$body_mass_g)^2))
})
mean_rmse <- mean(rmse_k)
mean_rmse
[1] 294.9655

5. Report

tbl_regression(fit_expl, intercept = TRUE) |>
  modify_caption("**Table 1. Linear-regression estimates for body mass (g).**")
Table 1. Linear-regression estimates for body mass (g).
Characteristic Beta 95% CI p-value
(Intercept) -366 -1,412, 681 0.5
flipper_length_mm 20 14, 26 <0.001
sex


    female
    male 530 456, 605 <0.001
species


    Adelie
    Chinstrap -88 -179, 3.5 0.060
    Gentoo 836 669, 1,004 <0.001
Abbreviation: CI = Confidence Interval

In the Palmer penguins dataset (n = 333), body mass was associated with flipper length, sex, and species (adjusted R² = 0.87). Out-of-sample performance, estimated by 5-fold cross-validation, was RMSE = 295 g.

Reporting-guideline map

Design Guideline URL
Randomised trial CONSORT https://www.consort-statement.org/
Observational study STROBE https://www.strobe-statement.org/
Diagnostic-accuracy study STARD https://www.equator-network.org/reporting-guidelines/stard/
Prediction-model study TRIPOD / TRIPOD-AI https://www.tripod-statement.org/
Systematic review PRISMA http://prisma-statement.org/

The distinction to impress on a PhD audience: an effect size in an explanatory model is an answer to a scientific question; in a predictive model it is an implementation detail.

Common pitfalls

  • Choosing a predictor set by R² and reporting a predictive claim, or tuning a predictive model and reporting causal-sounding coefficients.
  • Filling in a reporting checklist at submission rather than while drafting.
  • Assuming in-sample R² generalises.

Further reading

  • Breiman L. (2001). Statistical Modeling: The Two Cultures.
  • Shmueli G. (2010). To Explain or to Predict? Statistical Science.
  • Writing a report.

Session info

sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] palmerpenguins_0.1.1 broom_1.0.12         gtsummary_2.5.0     
 [4] lubridate_1.9.5      forcats_1.0.1        stringr_1.6.0       
 [7] dplyr_1.2.1          purrr_1.2.2          readr_2.2.0         
[10] tidyr_1.3.2          tibble_3.3.1         ggplot2_4.0.3       
[13] tidyverse_2.0.0     

loaded via a namespace (and not attached):
 [1] gt_1.3.0             sass_0.4.10          generics_0.1.4      
 [4] xml2_1.5.2           stringi_1.8.7        lattice_0.22-6      
 [7] hms_1.1.4            digest_0.6.39        magrittr_2.0.5      
[10] evaluate_1.0.5       grid_4.4.1           timechange_0.4.0    
[13] RColorBrewer_1.1-3   cards_0.7.1          fastmap_1.2.0       
[16] broom.helpers_1.22.0 jsonlite_2.0.0       Matrix_1.7-0        
[19] backports_1.5.1      mgcv_1.9-1           scales_1.4.0        
[22] labelled_2.16.0      cli_3.6.6            rlang_1.2.0         
[25] litedown_0.9         commonmark_2.0.0     splines_4.4.1       
[28] base64enc_0.1-6      withr_3.0.2          yaml_2.3.12         
[31] otel_0.2.0           tools_4.4.1          tzdb_0.5.0          
[34] vctrs_0.7.3          R6_2.6.1             lifecycle_1.0.5     
[37] fs_2.1.0             htmlwidgets_1.6.4    pkgconfig_2.0.3     
[40] pillar_1.11.1        gtable_0.3.6         glue_1.8.1          
[43] haven_2.5.5          xfun_0.57            tidyselect_1.2.1    
[46] knitr_1.51           farver_2.1.2         htmltools_0.5.9     
[49] nlme_3.1-164         rmarkdown_2.31       labeling_0.4.3      
[52] compiler_4.4.1       S7_0.2.2             markdown_2.0