Skip to contents

Query the OpenAlex API for scholarly works and return the results as an sm_corpus.

Supports both free-text search (query) and structured filter syntax (filter). Uses cursor-based pagination to retrieve up to n_max works. An API key (polite pool) and a mailto address are strongly recommended for higher rate limits.

Usage

sm_fetch_openalex(
  query = NULL,
  filter = NULL,
  n_max = 200L,
  per_page = 200L,
  mailto = Sys.getenv("SCIMAPR_MAILTO"),
  api_key = Sys.getenv("OPENALEX_API_KEY"),
  engine = c("native", "openalexR", "auto"),
  batch_size = 50L,
  verbose = TRUE,
  call = rlang::caller_env()
)

Arguments

query

Free-text search query passed to the search parameter. If NULL, filter must be supplied.

filter

A character string using OpenAlex filter syntax, e.g. "from_publication_date:2020-01-01,type:journal-article".

n_max

Maximum number of works to return (default 200).

per_page

Number of results per page (max 200).

mailto

Email address for the polite pool. Read from SCIMAPR_MAILTO env var by default.

api_key

OpenAlex API key. Read from OPENALEX_API_KEY env var.

engine

One of "native" (built-in httr2 client), "openalexR" (use the openalexR package), or "auto" (use openalexR if available, otherwise native).

batch_size

Integer; maximum number of |-joined values allowed in a single OpenAlex filter clause before the request is automatically split into multiple batched requests. OpenAlex limits the number of values in an OR filter and the overall URL length, so long DOI (or other ID) lists are chunked, fetched per batch, and row-bound with de-duplication. Default 50, which stays comfortably under the API's documented limit of 100.

verbose

Print progress messages?

call

Caller environment for error reporting.

Value

An sm_corpus object.

Examples

if (FALSE) { # \dontrun{
corpus <- sm_fetch_openalex(query = "bibliometrics", n_max = 10)
print(corpus)
} # }