Returns the full provenance table from a corpus, documenting the data lineage of every work: where it was fetched from, when, with which query, and which version of scimapR performed the ingestion.
Value
A tibble with columns work_id, source, source_id_external,
fetch_date, query, engine, scimapR_version, and prompt_hash.
See also
Other reproducibility:
sm_certificate(),
sm_cite_corpus(),
sm_diff_corpora(),
sm_hash_corpus(),
sm_snapshot()
Examples
corpus <- sm_example_corpus()
sm_provenance(corpus)
#> # A tibble: 200 × 8
#> work_id source source_id_external fetch_date query engine
#> <chr> <chr> <chr> <dttm> <chr> <chr>
#> 1 W000000001 synthetic NA 2026-06-01 16:46:53 sm_exampl… native
#> 2 W000000002 synthetic NA 2026-06-01 16:46:53 sm_exampl… native
#> 3 W000000003 synthetic NA 2026-06-01 16:46:53 sm_exampl… native
#> 4 W000000004 synthetic NA 2026-06-01 16:46:53 sm_exampl… native
#> 5 W000000005 synthetic NA 2026-06-01 16:46:53 sm_exampl… native
#> 6 W000000006 synthetic NA 2026-06-01 16:46:53 sm_exampl… native
#> 7 W000000007 synthetic NA 2026-06-01 16:46:53 sm_exampl… native
#> 8 W000000008 synthetic NA 2026-06-01 16:46:53 sm_exampl… native
#> 9 W000000009 synthetic NA 2026-06-01 16:46:53 sm_exampl… native
#> 10 W000000010 synthetic NA 2026-06-01 16:46:53 sm_exampl… native
#> # ℹ 190 more rows
#> # ℹ 2 more variables: scimapR_version <chr>, prompt_hash <chr>