Skip to contents

Extract structured information from a pathology PDF report using pdftools::pdf_text() and regex-based section parsing.

Usage

mp_read_pdf_report(path, template = "generic")

Arguments

path

Character scalar. Path to the .pdf file.

template

Character scalar. Template name controlling which regex patterns are applied. Currently "generic" (default) is supported.

Value

A molpath_parsed object whose data slot is a tibble with columns section and content extracted from the report (Patient, Sample, Findings, Interpretation, Recommendations).

Examples

# \donttest{
pdf_file <- system.file("extdata", "report.pdf", package = "molpathR")
if (nzchar(pdf_file)) {
  result <- mp_read_pdf_report(pdf_file)
  print(result)
}
# }