Course 3 — #courses
Note
Inference lab using the five-step template: Hypothesis → Visualise → Assumptions → Conduct → Conclude.
forecast::auto.arima.changepoint::cpt.mean.Course 2 regression; basic familiarity with the ts class.
A time series differs from cross-sectional data in one crucial way: observations close in time are correlated with one another, so standard errors based on independence are wrong. Classical decomposition splits a series into a slowly varying trend, a periodic seasonal component, and a residual. ARIMA models capture the residual autocorrelation with autoregressive and moving-average terms, possibly after differencing to remove a unit root.
Change-point detection asks a different question: given a series, when (if ever) did the underlying mean or variance shift? The changepoint package implements several methods; cpt.mean with a penalised likelihood criterion is a common starting point. In epidemiology, change-points flag outbreaks, policy changes, and data-collection transitions.
A seasonally adjusted series is not the same as a detrended series. Seasonal adjustment removes only the regular period; detrending removes the slow evolution. Most anomalies you care about (an outbreak, an intervention) live in what remains.
A simulated monthly series has a linear trend, a yearly seasonal cycle, and a mean shift at month 80. Decomposition, ARIMA, and change-point detection should each recover an interpretable piece.
STL decomposition assumes the seasonal period is fixed and known. ARIMA assumes a linear, time-invariant generating process after differencing. cpt.mean assumes the variance is approximately constant — change-points in variance would need cpt.var.
Series: ts_y
ARIMA(1,0,1)(0,1,2)[12] with drift
Coefficients:
ar1 ma1 sma1 sma2 drift
0.9432 -0.6838 -0.9214 0.0946 0.1178
s.e. 0.0474 0.0926 0.1075 0.1096 0.0189
sigma^2 = 3.441: log likelihood = -273.6
AIC=559.2 AICc=559.87 BIC=576.5
[1] 5 11 17 23 29 36 43 47 54 59 66 72 77 80 85 90 97 102 108
[20] 113 119 126 130 132 138
STL cleanly separated a linear trend, a 12-month seasonal cycle, and a residual containing the simulated mean shift.
auto.arimaselected 1, 1, 0, 2, 12, 0, 1 (p, q, P, Q, frequency, d, D). PELT detected change-points at months 5, 11, 17, 23, 29, 36, 43, 47, 54, 59, 66, 72, 77, 80, 85, 90, 97, 102, 108, 113, 119, 126, 130, 132, 138, close to the simulated truth of 80.
lm() on a time series as if residuals were independent.R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] changepoint_2.3 zoo_1.8-15 forecast_9.0.2 lubridate_1.9.5
[5] forcats_1.0.1 stringr_1.6.0 dplyr_1.2.1 purrr_1.2.2
[9] readr_2.2.0 tidyr_1.3.2 tibble_3.3.1 ggplot2_4.0.3
[13] tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] generics_0.1.4 stringi_1.8.7 lattice_0.22-6 hms_1.1.4
[5] digest_0.6.39 magrittr_2.0.5 evaluate_1.0.5 grid_4.4.1
[9] timechange_0.4.0 RColorBrewer_1.1-3 fastmap_1.2.0 jsonlite_2.0.0
[13] scales_1.4.0 cli_3.6.6 rlang_1.2.0 withr_3.0.2
[17] yaml_2.3.12 otel_0.2.0 tools_4.4.1 parallel_4.4.1
[21] tzdb_0.5.0 colorspace_2.1-2 vctrs_0.7.3 R6_2.6.1
[25] lifecycle_1.0.5 htmlwidgets_1.6.4 pkgconfig_2.0.3 urca_1.3-4
[29] pillar_1.11.1 gtable_0.3.6 glue_1.8.1 Rcpp_1.1.1-1.1
[33] xfun_0.57 tidyselect_1.2.1 knitr_1.51 farver_2.1.2
[37] htmltools_0.5.9 nlme_3.1-164 labeling_0.4.3 rmarkdown_2.31
[41] timeDate_4052.112 fracdiff_1.5-4 compiler_4.4.1 S7_0.2.2