Joins cached enrichment data into the matching sm_corpus sub-tibbles by
their key columns, returning an updated, schema-valid sm_corpus. This
replaces hand-written cache-to-corpus joins (which are easy to get wrong —
e.g. a bind_rows() that coerces a NULL element to a logical column).
Usage
sm_materialise(
corpus,
sources,
.by = NULL,
overwrite = FALSE,
call = rlang::caller_env()
)Arguments
- corpus
An
sm_corpus.- sources
Either a named list whose names are corpus sub-tables (
works,authors,authorships,sources,institutions,references,concepts, ...) and whose elements are tibbles or paths to cached.rds/.parquetfiles; or a single directory path containing<table>.rds/<table>.parquetfiles.- .by
Optional named list mapping table name to its join key column(s). Defaults to each table's natural key (e.g.
works->work_id).- overwrite
Logical (default
FALSE). WhenFALSE, enrichment only fillsNAcells of overlapping columns; populated cells are never overwritten. WhenTRUE, non-NAenrichment values win.- call
Caller environment for error reporting.
Value
An updated, validated sm_corpus with the enrichment columns merged
into the relevant sub-tables. New columns are added; existing rows are
preserved (this is a column-enrichment join, not a row append).
Details
Missing keys produce a cli::cli_warn and skip that source rather than
erroring. Internally, row-binds use a type-safe helper so a NULL/empty
source never corrupts a column's type.
Examples
corpus <- sm_example_corpus(n_works = 10, seed = 1)
metrics <- tibble::tibble(work_id = corpus$works$work_id,
cnci = runif(10, 0.5, 2))
corpus2 <- sm_materialise(corpus, sources = list(works = metrics))
#> ✔ Materialised 10 enrichment rows into works (+1 column).
"cnci" %in% names(corpus2$works)
#> [1] TRUE