Produces a detailed comparison between two sm_corpus objects, reporting
works added, removed, and changed; author differences; reference
differences; and screening changes. This is useful for auditing how a
corpus evolved between snapshots or after a refresh.
This function compares strictly by internal work_id. For content-based
reconciliation across corpora that do not share identifiers (matching by
normalised DOI with a fuzzy title fallback), use sm_reconcile().
Usage
sm_diff_corpora(corpus1, corpus2, call = rlang::caller_env())
# S3 method for class 'sm_corpus_diff'
print(x, ...)Value
An sm_corpus_diff S3 object (a list) with components:
- added
Tibble of works in
corpus2but notcorpus1.- removed
Tibble of works in
corpus1but notcorpus2.- changed
Tibble of works present in both but with differing fields.
- summary
A one-row summary tibble with counts.
- hash1
Hash of
corpus1.- hash2
Hash of
corpus2.
See also
Other reproducibility:
sm_certificate(),
sm_cite_corpus(),
sm_hash_corpus(),
sm_provenance(),
sm_snapshot()
Examples
# \donttest{
c1 <- sm_example_corpus(seed = 1L)
c2 <- sm_example_corpus(seed = 2L)
d <- sm_diff_corpora(c1, c2)
print(d)
#>
#> ── <sm_corpus_diff> ────────────────────────────────────────────────────────────
#> Works added: 0
#> Works removed: 0
#> Works changed: 1590
#> Works unchanged: -1390
#>
#> Authors added: 0
#> Authors removed: 0
#> References: 1932 -> 1846
#> Screening: 0 -> 0
#>
#> Hash (before): 8c148d1bfae1
#> Hash (after): 5d1d1da46a2d
# }