SpliceWiz: interactive analysis and visualization of alternative splicing in R

processBAM

R Documentation

Runs the OpenMP/C++ based SpliceWiz algorithm

Description

These function calls the SpliceWiz C++ routine on one or more BAM files.

The routine is an improved version over the original IRFinder, with OpenMP-based multi-threading and the production of compact "COV" files to record alignment coverage. A SpliceWiz reference built using Build-Reference-methods is required.

After processBAM() is run, users should call collateData to collate individual outputs into an experiment / dataset.

BAM2COV creates COV files from BAM files without running processBAM().

See details for performance info.

Usage

BAM2COV(
  bamfiles = "./Unsorted.bam",
  sample_names = "sample1",
  output_path = "./cov_folder",
  n_threads = 1,
  useOpenMP = TRUE,
  overwrite = FALSE,
  verbose = FALSE,
  multiRead = FALSE
)

processBAM(
  bamfiles = "./Unsorted.bam",
  sample_names = "sample1",
  reference_path = "./Reference",
  output_path = "./SpliceWiz_Output",
  n_threads = 1,
  useOpenMP = TRUE,
  overwrite = FALSE,
  run_featureCounts = FALSE,
  verbose = FALSE,
  skipCOVfiles = FALSE,
  multiRead = FALSE
)

Arguments

`bamfiles`	A vector containing file paths of 1 or more BAM files
`sample_names`	The sample names of the given BAM files. Must be a vector of the same length as `bamfiles`
`output_path`	The output directory of this function
`n_threads`	(default `1`) The number of threads to use. See details.
`useOpenMP`	(default `TRUE`) Whether to use OpenMP. If set to `FALSE`, BiocParallel will be used if `n_threads` is set
`overwrite`	(default `FALSE`) If output files already exist, will not attempt to re-run. If `run_featureCounts` is `TRUE`, will not overwrite gene counts of previous run unless `overwrite` is `TRUE`.
`verbose`	(default `FALSE`) Set to `TRUE` to allow `processBAM()` to output progress bars and messages
`multiRead`	(default `FALSE`) Whether SpliceWiz/ompBAM should use one (set to `FALSE`) or all available threads (set to `TRUE`) to read BAM files from the storage drive. In SSD drives or high performance computing clusters, setting to `TRUE` may slightly improve performance, whereas if reading from disk is the speed bottleneck, the default setting `FALSE` should result in higher performance.
`reference_path`	The directory containing the SpliceWiz reference
`run_featureCounts`	(default `FALSE`) Whether this function will run Rsubread::featureCounts on the BAM files after counting spliced reads. If so, the output will be saved to `⁠"main.FC.Rds⁠` in the `output_path` directory as a list object.
`skipCOVfiles`	(default `FALSE`) Whether processBAM should skip the production of COV files (containing coverage data). Default is to create COV files unless this is set to `TRUE`. COV files can be generated separately using BAM2COV

Details

Typical run-times for a 100-million paired-end alignment BAM file takes 10 minutes using a single core. Using 8 threads, the runtime is approximately 2-5 minutes, depending on your system's file input / output speeds. Approximately 10 Gb of RAM is used when OpenMP is used. If OpenMP is not used (see below), this memory usage is multiplied across the number of processor threads (i.e. 40 Gb if n_threads = 4).

OpenMP is natively available to Linux / Windows compilers, and OpenMP will be used if useOpenMP is set to TRUE, using multiple threads to process each BAM file. On Macs, if OpenMP is not available at compilation, BiocParallel will be used, processing BAM files simultaneously, with one BAM file per thread.

Value

Output will be saved to output_path. Output files will be named using the given sample_names. For processBAM():

sample.txt.gz: The main output file containing the quantitation of IR and splice junctions, as well as QC information
sample.cov: Contains coverage information in compressed binary. See getCoverage
main.FC.Rds: A single file containing gene counts for the whole dataset (only if run_featureCounts == TRUE)

For BAM2COV():

sample.cov: Contains coverage information in compressed binary. See getCoverage

Functions

BAM2COV(): Converts BAM files to COV files without running processBAM()
processBAM(): Processes BAM files. Requires a SpliceWiz reference generated by buildRef()

Examples


# Run BAM2COV, which only produces COV files but does not run `processBAM()`:

bams <- SpliceWiz_example_bams()

BAM2COV(bams$path, bams$sample,
  output_path = file.path(tempdir(), "SpliceWiz_Output"),
  n_threads = 2, overwrite = TRUE
)

# Run processBAM(), which produces:
# - text output of intron coverage and spliced read counts
# - COV files which record read coverages

example_ref <- file.path(tempdir(), "Reference")

buildRef(
    reference_path = example_ref,
    fasta = chrZ_genome(),
    gtf = chrZ_gtf()
)

bams <- SpliceWiz_example_bams()

processBAM(bams$path, bams$sample,
  reference_path = file.path(tempdir(), "Reference"),
  output_path = file.path(tempdir(), "SpliceWiz_Output"),
  n_threads = 2
)

alexchwong/SpliceWiz documentation built on April 17, 2025, 5:15 p.m.