Course 1 — #courses
Note
Workflow labs use the variant template: Goal → Approach → Execution → Check → Report.
R, RStudio, and Quarto installed (see Get started).
A statistical analysis is only as trustworthy as the environment that produced it. A result that depends on the version of a package, the operating system of the analyst, or an undocumented installation step cannot be confidently rerun by anyone else — including the analyst six months later. The job of the toolchain is to make the set of dependencies explicit and easy to restore.
R provides the language; RStudio provides an integrated environment that keeps project files, editor, console, and version control in one place; Quarto renders source documents into articles, slides, and reports while interleaving text and executable code. The last piece, renv, pins package versions to a lockfile that lives with the project. Together these give you a project that is self-contained, rebuildable on another machine, and testable against changes in its own dependencies.
Beginners often think of reproducibility as a requirement imposed by journals. The real payoff is more selfish: a colleague can pick up a project without you having to answer questions, a revision cycle does not force you to reconstruct the environment you used six months ago, and a silent upgrade of a package does not break a graph you no longer remember how to draw.
Stand up a minimal reproducible R project and demonstrate the commands that keep it reproducible: version reporting, environment snapshotting, and rendering.
A reproducible project has four minimum ingredients: a working directory, a package lockfile, a document that can be re-rendered, and a record of the R session. In this lab we simulate a tiny dataset, draw a figure from it, and record the environment used to produce both.
# A tibble: 30 × 3
subject group value
<int> <chr> <dbl>
1 1 A 12.7
2 2 A 8.87
3 3 A 10.7
4 4 A 11.3
5 5 A 10.8
6 6 A 9.79
7 7 A 13.0
8 8 A 9.81
9 9 A 14.0
10 10 A 9.87
# ℹ 20 more rows
The chunk above is what a reader would rerun. It must be self-contained: the data are simulated inside the document, the seed is set, and only tidyverse is required.
A minimal project contains:
my-project.Rproj — the RStudio project file (an anchor for the working directory).renv.lock — the JSON lockfile of pinned package versions.renv/ — the project’s private package library.index.qmd or report .qmd — your narrative.code/ or R/ folder — your scripts.data/ folder — ideally small and read-only; never raw CSVs you edited by hand.The renv::status() command reports drift between the library and the lockfile. In a shared project, your rule of thumb is: if status() reports anything other than “project is synchronised with the lockfile”, do not commit.
# A tibble: 3 × 2
package version
<chr> <chr>
1 tidyverse 2.0.0
2 ggplot2 4.0.3
3 dplyr 1.2.1
The two outputs above — R version plus a small table of key package versions — are what a reviewer needs to rerun the figure. Everything larger (sessionInfo()) lives at the bottom of the document.
A reproducible analysis is carried out in a project folder anchored by an
.Rprojfile, with dependencies pinned inrenv.lockand narrative plus code in a Quarto document. In this lab, 30 simulated observations were generated from two normal distributions and plotted as a boxplot; the document was rendered with R R version 4.4.1 (2024-06-14) and tidyverse 2.0.0.
The important phrase is self-contained. A file that someone else can run by cloning the repository and typing quarto render is the unit of reproducibility in this course.
renv::restore() on another machine produces different output.renv.lock by hand. Do not. Use snapshot().renv/library/ to git. The lockfile is what is tracked; the library is restored locally..qmd.vignette("renv").R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.5 forcats_1.0.1 stringr_1.6.0 dplyr_1.2.1
[5] purrr_1.2.2 readr_2.2.0 tidyr_1.3.2 tibble_3.3.1
[9] ggplot2_4.0.3 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.4.1 tidyselect_1.2.1
[5] scales_1.4.0 yaml_2.3.12 fastmap_1.2.0 R6_2.6.1
[9] labeling_0.4.3 generics_0.1.4 knitr_1.51 htmlwidgets_1.6.4
[13] pillar_1.11.1 RColorBrewer_1.1-3 tzdb_0.5.0 rlang_1.2.0
[17] utf8_1.2.6 stringi_1.8.7 xfun_0.57 S7_0.2.2
[21] otel_0.2.0 timechange_0.4.0 cli_3.6.6 withr_3.0.2
[25] magrittr_2.0.5 digest_0.6.39 grid_4.4.1 hms_1.1.4
[29] lifecycle_1.0.5 vctrs_0.7.3 evaluate_1.0.5 glue_1.8.1
[33] farver_2.1.2 rmarkdown_2.31 tools_4.4.1 pkgconfig_2.0.3
[37] htmltools_0.5.9