IRFinder: Runs the OpenMP/C++-based NxtIRF/IRFinder algorithm

View source: R/IRFinder.R

IRFinderR Documentation

Runs the OpenMP/C++-based NxtIRF/IRFinder algorithm

Description

These function calls the IRFinder C++ routine on one or more BAM files.

The routine is an improved version over the original IRFinder, with OpenMP-based multi-threading and the production of compact "COV" files to record alignment coverage. A NxtIRF reference built using BuildReference is required.

After IRFinder is run, users should call CollateData to collate individual outputs into an experiment / dataset.

BAM2COV creates COV files from BAM files without running the full IRFinder algorithm.

See details for performance info.

Usage

BAM2COV(
  bamfiles = "./Unsorted.bam",
  sample_names = "sample1",
  output_path = "./cov_folder",
  n_threads = 1,
  Use_OpenMP = TRUE,
  overwrite = FALSE,
  verbose = FALSE
)

IRFinder(
  bamfiles = "./Unsorted.bam",
  sample_names = "sample1",
  reference_path = "./Reference",
  output_path = "./IRFinder_Output",
  n_threads = 1,
  Use_OpenMP = TRUE,
  overwrite = FALSE,
  run_featureCounts = FALSE,
  verbose = FALSE
)

Arguments

bamfiles

A vector containing file paths of 1 or more BAM files

sample_names

The sample names of the given BAM files. Must be a vector of the same length as bamfiles

output_path

The output directory of this function

n_threads

(default 1) The number of threads to use. See details.

Use_OpenMP

(default TRUE) Whether to use OpenMP to run IRFinder. If set to FALSE, BiocParallel will be used if n_threads is set

overwrite

(default FALSE) If IRFinder output files already exist, will not attempt to re-run. If run_featureCounts is TRUE, will not overwrite gene counts of previous run unless overwrite is TRUE.

verbose

(default FALSE) Set to TRUE to allow IRFinder to output progress bars and messages

reference_path

The directory containing the NxtIRF reference

run_featureCounts

(default FALSE) Whether this function will run Rsubread::featureCounts on the BAM files after running IRFinder. If so, the output will be saved to "main.FC.Rds in the output_path directory as a list object.

Details

Typical run-times for a 100-million paired-end alignment BAM file takes 10 minutes using a single core. Using 8 threads, the runtime is approximately 2 minutes. Approximately 10 Gb of RAM is used when OpenMP is used. If OpenMP is not used (see below), this memory usage is multiplied across the number of processor threads (i.e. 40 Gb if n_threads = 4).

OpenMP is natively available to Linux / Windows compilers, and OpenMP will be used if Use_OpenMP is set to TRUE, using multiple threads to process each BAM file. On Macs, if OpenMP is not available at compilation, BiocParallel will be used, processing BAM files simultaneously, with one BAM file per thread.

Value

IRFinder output will be saved to output_path. Output files will be named using the given sample names.

  • sample.txt.gz: The main IRFinder output file containing the quantitation of IR and splice junctions, as well as QC information

  • sample.cov: Contains coverage information in compressed binary. See GetCoverage

  • main.FC.Rds: A single file containing gene counts for the whole dataset (only if run_featureCounts == TRUE)

Functions

  • BAM2COV: Converts BAM files to COV files without running IRFinder algorithm

  • IRFinder: Runs IRFinder algorithm on BAM files. Requires a NxtIRF/IRFinder reference generated by BuildReference()

See Also

BuildReference CollateData IsCOV

Examples


# Run BAM2COV, which only produces COV files but does not run IRFinder:

bams <- NxtIRF_example_bams()

BAM2COV(bams$path, bams$sample,
  output_path = file.path(tempdir(), "IRFinder_output"),
  n_threads = 2, overwrite = TRUE
)

# Run IRFinder algorithm, which produces:
# - text output of intron coverage and spliced read counts
# - COV files which record read coverages

example_ref <- file.path(tempdir(), "Reference")

BuildReference(
    reference_path = example_ref,
    fasta = chrZ_genome(),
    gtf = chrZ_gtf()
)

bams <- NxtIRF_example_bams()

IRFinder(bams$path, bams$sample,
  reference_path = file.path(tempdir(), "Reference"),
  output_path = file.path(tempdir(), "IRFinder_output"),
  n_threads = 2
)

alexchwong/NxtIRFcore documentation built on Oct. 31, 2022, 9:14 a.m.