Skip to contents

Detect the bibliographic file format from the file extension and content, then dispatch to the appropriate reader function.

Usage

sm_read_auto(
  path,
  encoding = "UTF-8",
  engine = c("native", "bibliometrix", "auto"),
  verbose = TRUE,
  call = rlang::caller_env()
)

Arguments

path

Character scalar. Path to a bibliographic file.

encoding

Character scalar. File encoding (default "UTF-8").

engine

Character scalar. One of "native" (built-in parser), "bibliometrix" (delegate to bibliometrix::convert2df()), or "auto" (try bibliometrix first, fall back to native). Passed through to the selected reader. Ignored for formats without engine support (OpenAlex JSON, Zotero, EndNote XML).

verbose

Logical. Print progress messages?

call

Caller environment for error reporting.

Value

An sm_corpus object.

Implementation

Format detection proceeds in two stages:

  1. Extension-based: .bib (BibTeX), .ris (RIS), .json/.jsonl (OpenAlex JSON), .xml (PubMed XML or EndNote XML).

  2. Content-based: For .csv, .tsv, and .txt files, the first few lines are inspected for format-specific signatures:

    • WoS plaintext: begins with FN or PT tags

    • Scopus CSV: contains EID column header

    • Lens CSV: contains Lens ID column header

    • Dimensions CSV: contains Dimensions ID or PubYear header

    • Cochrane CSV: contains Cochrane in header or record-like structure

    • Zotero CSV: contains Key and Item Type columns

    • RIS-format content in non-.ris files

For XML files, the root element or DTD is inspected to distinguish PubMed XML (PubmedArticleSet or PubmedArticle) from EndNote XML (xml/records or records).

References

Aria, M. & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. doi:10.1016/j.joi.2017.08.007

Examples

if (FALSE) { # \dontrun{
corpus <- sm_read_auto("references.bib")
corpus <- sm_read_auto("exported_data.csv")
} # }