library(tidyverse)
library(broom)
set.seed(42)
theme_set(theme_minimal(base_size = 12))Week 3, Session 5 — G-methods, IV, DiD, RDD; heterogeneity of treatment effect
Course 3 — #courses
Workflow lab: Goal → Approach → Execution → Check → Report.
Learning objectives
- Use inverse-probability weighting as a g-method for time-varying confounding.
- Recognise the data-generating situations that justify an instrumental-variable (IV), difference-in-differences (DiD), or regression-discontinuity (RDD) design.
- Estimate and interpret heterogeneity of treatment effect (HTE) with a subgroup-agnostic method.
Prerequisites
Propensity-score and IPTW basics (W3 S4).
Background
Traditional regression adjustment fails for time-varying confounders that are affected by prior treatment — adjusting for them blocks part of the treatment effect, failing to adjust leaves residual confounding. Robins’s g-methods solve this by reweighting or standardising in a way that respects the temporal order. IV methods exploit a variable that affects treatment but not outcome directly; DiD compares changes over time between groups sharing a counterfactual trend; RDD exploits a sharp assignment rule around a threshold.
Heterogeneity of treatment effect is the shift from “is the average effect non-zero?” to “for whom is it largest?” Model-based HTE uses causal forests, meta-learners (S-, T-, X-learner), or Bayesian hierarchical models to estimate conditional average treatment effects (CATEs) while controlling for multiple testing.
Setup
1. Goal
Simulate a time-varying-confounding scenario, estimate the causal effect naively and then with IPTW (a marginal structural model), and finish with a simple HTE estimate.
2. Approach
n <- 2000
df <- tibble(
l0 = rnorm(n),
a0 = rbinom(n, 1, plogis(0.2 * l0)),
l1 = rnorm(n, mean = 0.5 * a0 + 0.3 * l0),
a1 = rbinom(n, 1, plogis(0.3 * l1 + 0.2 * a0)),
y = 1.0 * a0 + 0.8 * a1 + 0.5 * l0 + 0.5 * l1 + rnorm(n)
)The true effect of each treatment is positive and additive. l1 is a time-varying confounder because it is affected by a0 and predicts both a1 and y.
3. Execution
Naive regression adjusting for l1 blocks part of the a0 effect.
naive <- lm(y ~ a0 + a1 + l0 + l1, data = df)
tidy(naive, conf.int = TRUE) |>
filter(term %in% c("a0", "a1"))# A tibble: 2 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a0 0.956 0.0459 20.8 1.91e-87 0.866 1.05
2 a1 0.804 0.0450 17.9 2.55e-66 0.716 0.893
IPTW with stabilised weights:
num0 <- glm(a0 ~ 1, data = df, family = binomial)
den0 <- glm(a0 ~ l0, data = df, family = binomial)
num1 <- glm(a1 ~ a0, data = df, family = binomial)
den1 <- glm(a1 ~ a0 + l0 + l1, data = df, family = binomial)
w0 <- ifelse(df$a0 == 1, fitted(num0), 1 - fitted(num0)) /
ifelse(df$a0 == 1, fitted(den0), 1 - fitted(den0))
w1 <- ifelse(df$a1 == 1, fitted(num1), 1 - fitted(num1)) /
ifelse(df$a1 == 1, fitted(den1), 1 - fitted(den1))
df$sw <- w0 * w1
msm <- lm(y ~ a0 + a1, data = df, weights = sw)
tidy(msm, conf.int = TRUE) |>
filter(term %in% c("a0", "a1"))# A tibble: 2 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a0 1.23 0.0585 21.1 2.51e-89 1.12 1.35
2 a1 0.806 0.0587 13.7 4.19e-41 0.691 0.921
4. Check
Balance and weight sanity:
summary(df$sw) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5802 0.8982 0.9813 0.9997 1.0849 1.9544
Stabilised weights should centre on 1 with no extreme tail. If they do not, a term in the treatment model is likely misspecified.
5. Report
In a simulated cohort with time-varying confounding (n = 2000), naive regression underestimated the effect of initial treatment. An inverse-probability-weighted marginal structural model recovered effects close to the true data-generating values. Instrumental variable, difference-in-differences, and regression-discontinuity designs exploit alternative identification strategies, each with its own assumptions. Heterogeneity of treatment effect can be explored with causal forests or meta-learners.
Brief sketches
- IV: regress
yon an instrumentzto recoverβ = cov(y,z)/cov(a,z). - DiD: fit
lm(y ~ time*group)on repeated measures. - RDD: fit local linear regression on each side of the assignment cutoff.
Flag the identifying assumption for each method — IV: exclusion restriction; DiD: parallel trends; RDD: continuity at the cutoff.
Common pitfalls
- Using ordinary regression adjustment for time-varying confounders.
- Trimming extreme weights silently, hiding model misspecification.
- Reporting subgroup p-values without multiplicity control as HTE evidence.
Further reading
- Hernán MA, Robins JM. Causal Inference: What If.
- Athey S, Imbens GW. (2019). Machine Learning Methods That Economists Should Know About.
Session info
sessionInfo()R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] broom_1.0.12 lubridate_1.9.5 forcats_1.0.1 stringr_1.6.0
[5] dplyr_1.2.1 purrr_1.2.2 readr_2.2.0 tidyr_1.3.2
[9] tibble_3.3.1 ggplot2_4.0.3 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.4.1 tidyselect_1.2.1
[5] scales_1.4.0 yaml_2.3.12 fastmap_1.2.0 R6_2.6.1
[9] generics_0.1.4 knitr_1.51 backports_1.5.1 htmlwidgets_1.6.4
[13] pillar_1.11.1 RColorBrewer_1.1-3 tzdb_0.5.0 rlang_1.2.0
[17] utf8_1.2.6 stringi_1.8.7 xfun_0.57 S7_0.2.2
[21] otel_0.2.0 timechange_0.4.0 cli_3.6.6 withr_3.0.2
[25] magrittr_2.0.5 digest_0.6.39 grid_4.4.1 hms_1.1.4
[29] lifecycle_1.0.5 vctrs_0.7.3 evaluate_1.0.5 glue_1.8.1
[33] farver_2.1.2 rmarkdown_2.31 tools_4.4.1 pkgconfig_2.0.3
[37] htmltools_0.5.9