Parse Web of Science (WoS) plaintext export files into an sm_corpus
object. Handles the standard WoS tagged format with two-letter field codes.
Usage
sm_read_wos(
path,
encoding = "UTF-8",
engine = c("native", "bibliometrix", "auto"),
verbose = TRUE,
call = rlang::caller_env()
)Arguments
- path
Character scalar. Path to a WoS plaintext file (
.txt).- encoding
Character scalar. File encoding (default
"UTF-8").- engine
Character scalar. One of
"native"(built-in parser),"bibliometrix"(delegate tobibliometrix::convert2df()), or"auto"(try bibliometrix first, fall back to native).- verbose
Logical. Print progress messages?
- call
Caller environment for error reporting.
Value
An sm_corpus object.
Implementation
The native parser follows the Web of Science Core Collection export format.
Each record begins with PT (publication type) and ends with ER.
Field tags are two uppercase letters followed by a single space.
Continuation lines begin with three spaces. Key tags parsed:
AU (authors), TI (title), SO (source), AB (abstract), DI (DOI),
PY (year), DT (document type), C1 (addresses), RP (reprint author),
CR (cited references), NR (number of references), TC (times cited),
SC (subject category), UT (unique identifier), LA (language).
References
Aria, M. & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. doi:10.1016/j.joi.2017.08.007