Skip to contents

Parse OpenAlex JSON export files into an sm_corpus object. Handles both JSON arrays of works and newline-delimited JSON (JSONL) files. Extracts fields from the OpenAlex works schema including id, doi, title, publication year, type, cited_by_count, authorships, host venue, concepts, open access status, and referenced works.

Usage

sm_read_openalex_json(
  path,
  encoding = "UTF-8",
  verbose = TRUE,
  call = rlang::caller_env()
)

Arguments

path

Character scalar. Path to a .json or .jsonl file.

encoding

Character scalar. File encoding (default "UTF-8").

verbose

Logical. Print progress messages?

call

Caller environment for error reporting.

Value

An sm_corpus object.

Implementation

The parser reads the OpenAlex works JSON schema as documented at https://developers.openalex.org/api-entities/works/work-object. Both standard JSON arrays and newline-delimited JSON (one work per line) are supported. No bibliometrix engine is available since OpenAlex is not a shared bibliometric export format.

References

Aria, M. & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. doi:10.1016/j.joi.2017.08.007

Examples

if (FALSE) { # \dontrun{
corpus <- sm_read_openalex_json("openalex_works.json")
corpus$works
} # }