Week 3, Session 4 — Propensity scores and IPTW

Course 3 — #courses

Author

R. Heller

Note

Inference lab using the five-step template: Hypothesis → Visualise → Assumptions → Conduct → Conclude.

Learning objectives

Estimate a propensity score and match nearest-neighbour with MatchIt.
Construct inverse-probability-of-treatment weights (IPTW) and fit a weighted outcome model.
Diagnose covariate balance with cobalt::love.plot.

Prerequisites

Session 3 (DAGs). Course 2 logistic regression.

Background

The propensity score is the probability of being treated given covariates. Under the assumptions of conditional exchangeability and positivity, conditioning on the propensity score yields an unbiased estimate of the average treatment effect. Matching and inverse-probability-of-treatment weighting are two ways to do that conditioning.

Matching pairs each treated unit with one (or more) untreated units with similar propensity scores, then analyses the matched sample. IPTW gives each unit a weight of 1/e(X) if treated and 1/(1 − e(X)) if untreated, creating a pseudo-population where treatment is independent of X. Both estimators require the propensity model to be correctly specified; neither handles unmeasured confounding.

Balance diagnostics assess whether the matched or weighted sample has covariate distributions that are similar across treatment groups. The standardised mean difference (SMD) is the workhorse metric; anything below 0.1 is usually considered balanced.

Large weights (extreme propensities near 0 or 1) are a sign of poor positivity. Stabilised weights and truncation can help, but the underlying problem — a region of covariate space where one treatment is never seen — cannot be fixed statistically.

Setup

library(tidyverse)
library(MatchIt)
library(cobalt)
set.seed(42)
theme_set(theme_minimal(base_size = 12))

1. Hypothesis

Treatment reduces the outcome by a known amount. We will estimate the effect with naive regression, matching, and IPTW, and compare with the simulated truth.

2. Visualise

n <- 1000
dat <- tibble(
  age  = rnorm(n, 60, 10),
  sev  = rnorm(n, 0,   1),
  sex  = rbinom(n, 1, 0.5)
) |>
  mutate(ps = plogis(-1 + 0.04 * (age - 60) + 0.8 * sev + 0.3 * sex),
         trt = rbinom(n, 1, ps),
         y   = 2 - 1.5 * trt + 0.05 * (age - 60) +
                 0.8 * sev + 0.3 * sex + rnorm(n, 0, 1))

ggplot(dat, aes(ps, fill = factor(trt))) +
  geom_density(alpha = 0.5) +
  labs(x = "Propensity score", y = "Density", fill = "Treated?")

3. Assumptions

No unmeasured confounding (conditional exchangeability); positivity (everyone has a non-zero chance of each treatment); correct specification of the propensity model.

4. Conduct

# Naive regression
fit_naive <- lm(y ~ trt + age + sev + sex, data = dat)
coef(fit_naive)["trt"]

      trt 
-1.502417

# Nearest-neighbour matching
m <- matchit(trt ~ age + sev + sex, data = dat,
             method = "nearest", ratio = 1)
m

A `matchit` object
 - method: 1:1 nearest neighbor matching without replacement
 - distance: Propensity score
             - estimated with logistic regression
 - number of obs.: 1000 (original), 688 (matched)
 - target estimand: ATT
 - covariates: age, sev, sex

matched <- match.data(m)

fit_match <- lm(y ~ trt, data = matched,
                weights = matched$weights)
coef(fit_match)["trt"]

      trt 
-1.281375

# IPTW
dat <- dat |>
  mutate(w = if_else(trt == 1, 1 / ps, 1 / (1 - ps)))
fit_iptw <- lm(y ~ trt, data = dat, weights = w)
coef(fit_iptw)["trt"]

      trt 
-1.592614

love.plot(m, thresholds = c(m = 0.1), abs = TRUE)

5. Concluding statement

The simulated treatment effect was −1.5. The naive regression gave -1.5; nearest-neighbour matching gave -1.28; IPTW gave -1.59. Balance plots showed that matching reduced all SMDs below 0.1.

Make sure students understand that a well-balanced sample is a necessary but not sufficient condition for an unbiased effect estimate.

Common pitfalls

Matching with replacement without adjusting inference for the repeated use of controls.
Reporting only the p-value for the treatment effect in the weighted model without a sandwich or bootstrap standard error.
Ignoring extreme weights.
Picking the matching method that gives the effect you wanted.

Session info

sessionInfo()

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cobalt_4.6.2    MatchIt_4.7.2   lubridate_1.9.5 forcats_1.0.1  
 [5] stringr_1.6.0   dplyr_1.2.1     purrr_1.2.2     readr_2.2.0    
 [9] tidyr_1.3.2     tibble_3.3.1    ggplot2_4.0.3   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] gtable_0.3.6       jsonlite_2.0.0     compiler_4.4.1     Rcpp_1.1.1-1.1    
 [5] tidyselect_1.2.1   scales_1.4.0       yaml_2.3.12        fastmap_1.2.0     
 [9] R6_2.6.1           labeling_0.4.3     generics_0.1.4     knitr_1.51        
[13] backports_1.5.1    htmlwidgets_1.6.4  chk_0.10.0         pillar_1.11.1     
[17] RColorBrewer_1.1-3 tzdb_0.5.0         rlang_1.2.0        stringi_1.8.7     
[21] xfun_0.57          S7_0.2.2           otel_0.2.0         timechange_0.4.0  
[25] cli_3.6.6          withr_3.0.2        magrittr_2.0.5     digest_0.6.39     
[29] grid_4.4.1         hms_1.1.4          lifecycle_1.0.5    vctrs_0.7.3       
[33] evaluate_1.0.5     glue_1.8.1         farver_2.1.2       rmarkdown_2.31    
[37] tools_4.4.1        pkgconfig_2.0.3    htmltools_0.5.9

Learning objectives

Prerequisites

Background

Setup

1. Hypothesis

2. Visualise

3. Assumptions

4. Conduct

5. Concluding statement

Common pitfalls

Further reading

Session info

Related labs