Decision tree: which method?

APPENDIX · DECISION TREE

Decision tree: which method?

A branching guide from what you are asking to the lab that shows you how to answer it.

The tree below is not a substitute for thinking; it is a scaffold for thinking. Start from your scientific question, follow the branches, and land on a lab. Every leaf links into the curriculum.

flowchart TD
  Start{What is the question?}
  Start --> Q1[Compare two or more groups]
  Start --> Q2[Quantify an association]
  Start --> Q3[Model an outcome from predictors]
  Start --> Q4[Handle clustered or repeated data]
  Start --> Q5[Deal with high-dimensional data]
  Start --> Q6[Estimate a causal effect]

  Q1 --> G1[Continuous outcome]
  G1 --> G1a[Two groups → t-test / Wilcoxon]
  G1 --> G1b[Many groups → ANOVA / Kruskal-Wallis]

  Q1 --> G2[Binary outcome]
  G2 --> G2a[Two proportions → chi-square / Fisher]
  G2 --> G2b[Risk or odds ratio]

  Q2 --> A1[Two continuous → Pearson / Spearman]
  Q2 --> A2[Continuous & categorical → ANOVA]
  Q2 --> A3[Categorical & categorical → chi-square]

  Q3 --> M1[Continuous outcome → linear regression]
  Q3 --> M2[Binary → logistic]
  Q3 --> M3[Count → Poisson / NB]
  Q3 --> M4[Ordinal → ordinal logit]
  Q3 --> M5[Time-to-event → Cox / KM]

  Q4 --> C1[Linear mixed model]
  Q4 --> C2[GLMM or GEE]
  Q4 --> C3[Repeated measures ANOVA]

  Q5 --> H1[Regularisation → lasso / ridge]
  Q5 --> H2[Dimension reduction → PCA / UMAP]
  Q5 --> H3[Tree ensemble → RF / xgboost]

  Q6 --> CA1[Randomised → ITT analysis]
  Q6 --> CA2[Observational → DAG → adjustment set]
  Q6 --> CA3[Non-randomised → IPTW / matching]
  Q6 --> CA4[Strong instrument → IV]

Using the tree

The first branch is always about the shape of your question: comparison, association, modelling, clustering/repetition, dimension, or causation. The later branches are about the shape of your data: continuous, binary, count, time-to-event.

Comparing groups

  • Two groups, continuous: start with a t-test if normality holds, the Wilcoxon rank-sum otherwise. See Course 1 W4 S1 and W4 S4.
  • More than two groups, continuous: ANOVA (or Kruskal-Wallis). See Course 2 W2 S1.
  • Two proportions: chi-square with large cells, Fisher with small. See Course 1 W4 S2.

Associations

  • Two continuous variables: Pearson if roughly linear and normal, Spearman otherwise. See Course 1 W4 S3.
  • A continuous and a categorical variable: ANOVA is equivalent to a linear model with a categorical predictor. See Course 2 W1 S2.
  • Two categorical variables: chi-square test of independence, or loglinear model for more than two factors.

Regression modelling

Clustered or repeated data

High-dimensional data

Estimating a causal effect