Skip to contents

Overview

molpathR provides dedicated parsers for each molecular pathology data source. All parsers return molpath_parsed objects that can be combined into a unified database using mp_build_db().

Available parsers

Function File types Description
mp_read_vcf() .vcf, .vcf.gz Variant Call Format files
mp_read_fastq() .fastq, .fq FASTQ sequence files
mp_read_bam() .bam BAM alignment files
mp_read_xml_report() .xml XML variant interpretation reports
mp_read_pdf_report() .pdf Pathology PDF reports
mp_read_nexus_pathology() .csv, .xml Nexus Pathology exports
mp_read_nexus_clinical() .csv Nexus clinical data exports
mp_read_survival() .xlsx, .csv Survival/outcome data
mp_read_auto() any Auto-detect and dispatch

Example: parsing VCF files

library(molpathR)

# Parse a single VCF file
parsed_vcf <- mp_read_vcf("path/to/variants.vcf")
parsed_vcf

# View the data
head(parsed_vcf$data)

Building a database from multiple sources

# Parse several files
vcf1 <- mp_read_vcf("sample1.vcf")
vcf2 <- mp_read_vcf("sample2.vcf")
surv <- mp_read_survival("survival_data.xlsx")

# Build the database
db <- mp_build_db(vcf1, vcf2, surv)

# Validate integrity
mp_validate_db(db)

# Save for later use
mp_save_db(db, "my_database.rds")

Auto-detection with mp_read_auto()

# Automatically detects file type and uses the correct parser
parsed <- mp_read_auto("unknown_file.vcf")
parsed$source_type

Handling parse errors

All parsers use defensive parsing with informative error messages:

# Malformed files produce warnings, not crashes
result <- mp_read_vcf("possibly_broken.vcf")