Skip to contents

Fetch pre-computed SPECTER embeddings from the Semantic Scholar API for works in the corpus and attach them as the embeddings matrix.

SPECTER embeddings are 768-dimensional vectors useful for computing document similarity, clustering, and visualisation. No Python installation is required – this function retrieves pre-computed vectors from the API.

This enricher is idempotent: works that already have embeddings are skipped.

Usage

sm_enrich_specter(
  corpus,
  api_key = Sys.getenv("SEMANTIC_SCHOLAR_API_KEY"),
  verbose = TRUE,
  call = rlang::caller_env()
)

Arguments

corpus

An sm_corpus object.

api_key

Semantic Scholar API key. Read from SEMANTIC_SCHOLAR_API_KEY env var by default.

verbose

Print progress messages?

call

Caller environment for error reporting.

Value

An sm_corpus object with an embeddings matrix, plus new provenance rows.

Examples

if (FALSE) { # \dontrun{
corpus <- sm_example_corpus()
corpus <- sm_enrich_specter(corpus)
} # }