flowchart TD
Start{What is the question?}
Start --> Q1[Compare two or more groups]
Start --> Q2[Quantify an association]
Start --> Q3[Model an outcome from predictors]
Start --> Q4[Handle clustered or repeated data]
Start --> Q5[Deal with high-dimensional data]
Start --> Q6[Estimate a causal effect]
Q1 --> G1[Continuous outcome]
G1 --> G1a[Two groups → t-test / Wilcoxon]
G1 --> G1b[Many groups → ANOVA / Kruskal-Wallis]
Q1 --> G2[Binary outcome]
G2 --> G2a[Two proportions → chi-square / Fisher]
G2 --> G2b[Risk or odds ratio]
Q2 --> A1[Two continuous → Pearson / Spearman]
Q2 --> A2[Continuous & categorical → ANOVA]
Q2 --> A3[Categorical & categorical → chi-square]
Q3 --> M1[Continuous outcome → linear regression]
Q3 --> M2[Binary → logistic]
Q3 --> M3[Count → Poisson / NB]
Q3 --> M4[Ordinal → ordinal logit]
Q3 --> M5[Time-to-event → Cox / KM]
Q4 --> C1[Linear mixed model]
Q4 --> C2[GLMM or GEE]
Q4 --> C3[Repeated measures ANOVA]
Q5 --> H1[Regularisation → lasso / ridge]
Q5 --> H2[Dimension reduction → PCA / UMAP]
Q5 --> H3[Tree ensemble → RF / xgboost]
Q6 --> CA1[Randomised → ITT analysis]
Q6 --> CA2[Observational → DAG → adjustment set]
Q6 --> CA3[Non-randomised → IPTW / matching]
Q6 --> CA4[Strong instrument → IV]
Decision tree: which method?
APPENDIX · DECISION TREE
Decision tree: which method?
A branching guide from what you are asking to the lab that shows you how to answer it.
The tree below is not a substitute for thinking; it is a scaffold for thinking. Start from your scientific question, follow the branches, and land on a lab. Every leaf links into the curriculum.
Using the tree
The first branch is always about the shape of your question: comparison, association, modelling, clustering/repetition, dimension, or causation. The later branches are about the shape of your data: continuous, binary, count, time-to-event.
Comparing groups
- Two groups, continuous: start with a t-test if normality holds, the Wilcoxon rank-sum otherwise. See Course 1 W4 S1 and W4 S4.
- More than two groups, continuous: ANOVA (or Kruskal-Wallis). See Course 2 W2 S1.
- Two proportions: chi-square with large cells, Fisher with small. See Course 1 W4 S2.
Associations
- Two continuous variables: Pearson if roughly linear and normal, Spearman otherwise. See Course 1 W4 S3.
- A continuous and a categorical variable: ANOVA is equivalent to a linear model with a categorical predictor. See Course 2 W1 S2.
- Two categorical variables: chi-square test of independence, or loglinear model for more than two factors.
Regression modelling
- Continuous outcome → linear regression. Course 2 W1.
- Binary outcome → logistic regression. Course 2 W3 S1.
- Count outcome → Poisson / negative binomial. Course 2 W3 S4.
- Ordinal outcome → ordinal logit. Course 2 W3 S3.
- Time-to-event → Cox proportional hazards. Course 2 W4 S3.
Clustered or repeated data
- Continuous outcome: linear mixed model. Course 3 W2 S3.
- Binary or count outcome: GLMM or GEE. Course 3 W2 S4.
High-dimensional data
- Many predictors: regularisation with
glmnet. Course 4 W1 S2. - Many variables, fewer dimensions wanted: PCA, UMAP, t-SNE. Course 4 W1 S3 and S5.
- Prediction on messy tabular data: tree ensembles. Course 4 W2 S1.
Estimating a causal effect
- Randomised trial: analyse by ITT. Course 3 W1 S2.
- Observational data: draw a DAG, pick an adjustment set. Course 3 W3 S3.
- Non-randomised treatment: propensity scores or IPTW. Course 3 W3 S4.
- A plausible instrument: instrumental-variable regression. Course 3 W3 S5.