Overview
molpathR provides dedicated parsers for each molecular pathology data
source. All parsers return molpath_parsed objects that can
be combined into a unified database using
mp_build_db().
Available parsers
| Function | File types | Description |
|---|---|---|
mp_read_vcf() |
.vcf, .vcf.gz | Variant Call Format files |
mp_read_fastq() |
.fastq, .fq | FASTQ sequence files |
mp_read_bam() |
.bam | BAM alignment files |
mp_read_xml_report() |
.xml | XML variant interpretation reports |
mp_read_pdf_report() |
Pathology PDF reports | |
mp_read_nexus_pathology() |
.csv, .xml | Nexus Pathology exports |
mp_read_nexus_clinical() |
.csv | Nexus clinical data exports |
mp_read_survival() |
.xlsx, .csv | Survival/outcome data |
mp_read_auto() |
any | Auto-detect and dispatch |
Example: parsing VCF files
library(molpathR)
# Parse a single VCF file
parsed_vcf <- mp_read_vcf("path/to/variants.vcf")
parsed_vcf
# View the data
head(parsed_vcf$data)Building a database from multiple sources
# Parse several files
vcf1 <- mp_read_vcf("sample1.vcf")
vcf2 <- mp_read_vcf("sample2.vcf")
surv <- mp_read_survival("survival_data.xlsx")
# Build the database
db <- mp_build_db(vcf1, vcf2, surv)
# Validate integrity
mp_validate_db(db)
# Save for later use
mp_save_db(db, "my_database.rds")Auto-detection with mp_read_auto()
# Automatically detects file type and uses the correct parser
parsed <- mp_read_auto("unknown_file.vcf")
parsed$source_typeHandling parse errors
All parsers use defensive parsing with informative error messages:
# Malformed files produce warnings, not crashes
result <- mp_read_vcf("possibly_broken.vcf")