Detect the bibliographic file format from the file extension and content, then dispatch to the appropriate reader function.
Usage
sm_read_auto(
path,
encoding = "UTF-8",
engine = c("native", "bibliometrix", "auto"),
verbose = TRUE,
call = rlang::caller_env()
)Arguments
- path
Character scalar. Path to a bibliographic file.
- encoding
Character scalar. File encoding (default
"UTF-8").- engine
Character scalar. One of
"native"(built-in parser),"bibliometrix"(delegate tobibliometrix::convert2df()), or"auto"(try bibliometrix first, fall back to native). Passed through to the selected reader. Ignored for formats without engine support (OpenAlex JSON, Zotero, EndNote XML).- verbose
Logical. Print progress messages?
- call
Caller environment for error reporting.
Value
An sm_corpus object.
Implementation
Format detection proceeds in two stages:
Extension-based:
.bib(BibTeX),.ris(RIS),.json/.jsonl(OpenAlex JSON),.xml(PubMed XML or EndNote XML).Content-based: For
.csv,.tsv, and.txtfiles, the first few lines are inspected for format-specific signatures:WoS plaintext: begins with
FNorPTtagsScopus CSV: contains
EIDcolumn headerLens CSV: contains
Lens IDcolumn headerDimensions CSV: contains
Dimensions IDorPubYearheaderCochrane CSV: contains
Cochranein header or record-like structureZotero CSV: contains
KeyandItem TypecolumnsRIS-format content in non-
.risfiles
For XML files, the root element or DTD is inspected to distinguish
PubMed XML (PubmedArticleSet or PubmedArticle) from
EndNote XML (xml/records or records).
References
Aria, M. & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. doi:10.1016/j.joi.2017.08.007