Orchestrates the complete RNA-seq analysis pipeline from FASTQ/BAM files or a count matrix through alignment, counting, normalization, differential expression, and visualization.
Arguments
- fastq_dir
Character or NULL. Directory containing FASTQ files. If provided, the pipeline starts from alignment.
- bam_dir
Character or NULL. Directory containing BAM files. If provided, the pipeline starts from read counting.
- count_matrix
A numeric matrix or NULL. Pre-computed count matrix. If provided, the pipeline starts from normalization/DE.
- output_dir
Character. Output directory for results. Default
"bambamR_output".- genome_index
Character or NULL. Path to genome index. Required if starting from FASTQ.
- annotation
Character or NULL. Path to GTF annotation. Required if starting from FASTQ or BAM.
- sample_info
A data.frame with sample metadata. Must contain a
conditioncolumn (or the variable indesign). Rownames should match sample identifiers.- aligner
Character. Aligner for FASTQ alignment. Default
"STAR".- de_method
Character. DE method:
"DESeq2","edgeR", or"limma". Default"DESeq2".- design
A formula for DE analysis. Default
~ condition.- skip
Character vector. Steps to skip:
"qc","align","count","de","viz". Defaultcharacter(0).- threads
Integer. Number of threads. Default
4.
Details
The pipeline allows entry at any stage:
From FASTQ: requires
fastq_dir,genome_index,annotation, andsample_infoFrom BAM: requires
bam_dir,annotation, andsample_infoFrom counts: requires
count_matrixandsample_info
Examples
# \donttest{
# Starting from a count matrix (minimal mode, no Bioconductor needed)
set.seed(42)
counts <- matrix(rpois(600, 100), nrow = 100, ncol = 6,
dimnames = list(paste0("gene", 1:100), paste0("S", 1:6)))
sample_info <- data.frame(
condition = factor(rep(c("ctrl", "treat"), each = 3)),
row.names = paste0("S", 1:6)
)
# result <- bb_pipeline(count_matrix = counts, sample_info = sample_info)
# }