Skip to contents

Parse Web of Science (WoS) plaintext export files into an sm_corpus object. Handles the standard WoS tagged format with two-letter field codes.

Usage

sm_read_wos(
  path,
  encoding = "UTF-8",
  engine = c("native", "bibliometrix", "auto"),
  verbose = TRUE,
  call = rlang::caller_env()
)

Arguments

path

Character scalar. Path to a WoS plaintext file (.txt).

encoding

Character scalar. File encoding (default "UTF-8").

engine

Character scalar. One of "native" (built-in parser), "bibliometrix" (delegate to bibliometrix::convert2df()), or "auto" (try bibliometrix first, fall back to native).

verbose

Logical. Print progress messages?

call

Caller environment for error reporting.

Value

An sm_corpus object.

Implementation

The native parser follows the Web of Science Core Collection export format. Each record begins with PT (publication type) and ends with ER. Field tags are two uppercase letters followed by a single space. Continuation lines begin with three spaces. Key tags parsed: AU (authors), TI (title), SO (source), AB (abstract), DI (DOI), PY (year), DT (document type), C1 (addresses), RP (reprint author), CR (cited references), NR (number of references), TC (times cited), SC (subject category), UT (unique identifier), LA (language).

References

Aria, M. & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. doi:10.1016/j.joi.2017.08.007

Examples

if (FALSE) { # \dontrun{
corpus <- sm_read_wos("savedrecs.txt")
corpus$works
} # }