collateData: Collates a dataset from (processBAM) output files of...
In alexchwong/SpliceWiz: interactive analysis and visualization of alternative splicing in R

collateData

R Documentation

Collates a dataset from (processBAM) output files of individual samples

Description

collateData() creates a dataset from a collection of processBAM output files belonging to an experiment.

Usage

collateData(
  Experiment,
  reference_path,
  output_path,
  IRMode = c("SpliceOver", "SpliceMax"),
  packageCOVfiles = FALSE,
  novelSplicing = FALSE,
  forceStrandAgnostic = FALSE,
  novelSplicing_minSamples = 3,
  novelSplicing_countThreshold = 10,
  novelSplicing_minSamplesAboveThreshold = 1,
  novelSplicing_requireOneAnnotatedSJ = TRUE,
  novelSplicing_useTJ = TRUE,
  overwrite = FALSE,
  n_threads = 1,
  lowMemoryMode = TRUE
)

Arguments

`Experiment`	(Required) A 2 or 3 column data frame, ideally generated by findSpliceWizOutput or findSamples. The first column designate the sample names, and the 2nd column contains the path to the processBAM output file (of type `sample.txt.gz`). (Optionally) a 3rd column contains the coverage files (of type `sample.cov`) of the corresponding samples. NB: all other columns are ignored.
`reference_path`	(Required) The path to the reference generated by Build-Reference-methods
`output_path`	(Required) The path to contain the output files for the collated dataset
`IRMode`	(default `SpliceOver`) The algorithm to calculate 'splice abundance' in IR quantification. Valid options are `SpliceOver` and `SpliceMax`. See details
`packageCOVfiles`	(default `FALSE`) Whether COV files should be copied over to the NxtSE object. This is useful if one wishes to transfer the NxtSE folder to a collaborator, who can then open the NxtSE object with valid COV file paths.
`novelSplicing`	(default FALSE) Whether collateData will use novel junction reads detected in samples to infer novel splice variants. All tandem split reads (those bridging two consecutive splice junctions) are used, as well as novel split reads that satisfy abundance criteria (see `novelSplicing_minSamples`, `novelSplicing_minSamplesAboveThreshold`, and `novelSplicing_countThreshold`) are used to synthesise a dataset-specific SpliceWiz reference. See details.
`forceStrandAgnostic`	(default `FALSE`) In poorly-prepared stranded libraries, it may be better to quantify in unstranded mode. Set this to `TRUE` if your stranded libraries may be contaminated with unstranded reads
`novelSplicing_minSamples`	(default 3) Novel junctions are included in building of novel reference if number samples with non-zero counts exceeds this number.
`novelSplicing_countThreshold`	(default 10) Threshold of split-reads across novel junctions; used in conjunction with `novelSplicing_minSamplesAboveThreshold`
`novelSplicing_minSamplesAboveThreshold`	(default 1) Novel junctions are included in building of novel reference if novel junction reads are above a pre-defined threshold exceeds this number
`novelSplicing_requireOneAnnotatedSJ`	(default `TRUE`) The default requires novel junctions to have one annotated splice site. If this is disabled, collateData will include novel junctions where neither splice site is annotated.
`novelSplicing_useTJ`	(default `TRUE`) For novel splicing, should SpliceWiz use reads with 2 or more junctions to find novel exons? Ignored if novelSplicing is set to `FALSE`.
`overwrite`	(default `FALSE`) If `collateData()` has previously been run using the same set of samples, it will not be overwritten unless this is set to `TRUE`.
`n_threads`	(default `1`) The number of threads to use. If you run out of memory, try lowering the number of threads
`lowMemoryMode`	(default `TRUE`) `collateData()` will perform optimizations to conserve memory if this is set to `TRUE`. Otherwise, will prioritise performance.

Details

In Windows, collateData runs using only 1 thread, as BiocParallel's MulticoreParam is not supported.

It is assumed that all sample processBAM outputs were generated using the same reference.

The combination of junction counts and IR quantification from processBAM is used to calculate percentage spliced in (PSI) of alternative splice events, and intron retention ratios (IR-ratio) of retained introns. Also, QC information is collated. Data is organised in a H5file and FST files for memory and processor efficient downstream access using makeSE.

The original IRFinder algorithm, see the following wiki, uses SpliceMax to estimate abundance of spliced transcripts. This calculates the number of mapped splice events that share the boundary coordinate of either the left or right flanking exon ⁠SpliceLeft,SpliceRight⁠, estimating splice abundance as the larger of the two values.

SpliceWiz proposes a new algorithm, SpliceOver, to account for the possibility that the major isoform shares neither boundary, but arises from either of the flanking exon clusters. Exon clusters are contiguous regions covered by exons from any transcript (except those designated as retained_intron or sense_intronic), and are separated by obligate intronic regions (genomic regions that are introns for all transcripts). For introns that are internal to a single exon cluster (i.e. akin to "known-exon" introns from IRFinder), SpliceOver uses GenomicRanges::findOverlaps to sum all splice reads that overlap the same genomic region as the intron of interest.

Detection of novel ASEs: When novelSplicing is set to TRUE, novel junctions (split reads across unannotated junctions from samples of the dataset being collated) are used in conjunction with the reference to compile a list of novel ASEs. To avoid being overwhelmed by a large number of false positive novel junctions (often due to mis-alignments), a simple filtering strategy is used. This involves including novel junctions only if it occurs in a minimum number of samples (default 3), or if the number of split reads of a novel junction is above a pre-defined threshold (default 10) in a certain number of samples (default 1). These parameters can be set using novelSplicing_minSamples, novelSplicing_countThreshold and novelSplicing_minSamplesAboveThreshold respectively.

Value

collateData() writes to the directory given by output_path. This output directory is portable (i.e. it can be moved to a different location after running collateData() before running makeSE), but individual files within the output folder should not be moved.

Also, the processBAM and collateData output folders should be copied to the same destination and their relative paths preserved. Otherwise, the locations of the "COV" files will not be recorded in the collated data and will have to be re-assigned using ⁠covfile(se)<-⁠. See makeSE

Examples

buildRef(
    reference_path = file.path(tempdir(), "Reference"),
    fasta = chrZ_genome(),
    gtf = chrZ_gtf()
)

bams <- SpliceWiz_example_bams()
processBAM(bams$path, bams$sample,
  reference_path = file.path(tempdir(), "Reference"),
  output_path = file.path(tempdir(), "SpliceWiz_Output")
)

expr <- findSpliceWizOutput(file.path(tempdir(), "SpliceWiz_Output"))
collateData(expr,
  reference_path = file.path(tempdir(), "Reference"),
  output_path = file.path(tempdir(), "Collated_output")
)

# Enable novel splicing:

collateData(expr,
  reference_path = file.path(tempdir(), "Reference"),
  output_path = file.path(tempdir(), "Collated_output"),
  novelSplicing = TRUE
)

alexchwong/SpliceWiz documentation built on April 17, 2025, 5:15 p.m.

alexchwong/SpliceWiz index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

alexchwong/SpliceWiz
interactive analysis and visualization of alternative splicing in R

collateData: Collates a dataset from (processBAM) output files of...
In alexchwong/SpliceWiz: interactive analysis and visualization of alternative splicing in R

Collates a dataset from (processBAM) output files of individual samples

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to collateData in alexchwong/SpliceWiz...

R Package Documentation

Browse R Packages

We want your feedback!

alexchwong/SpliceWiz interactive analysis and visualization of alternative splicing in R

collateData: Collates a dataset from (processBAM) output files of... In alexchwong/SpliceWiz: interactive analysis and visualization of alternative splicing in R

Collates a dataset from (processBAM) output files of individual samples

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to collateData in alexchwong/SpliceWiz...

R Package Documentation

Browse R Packages

We want your feedback!

alexchwong/SpliceWiz
interactive analysis and visualization of alternative splicing in R

collateData: Collates a dataset from (processBAM) output files of...
In alexchwong/SpliceWiz: interactive analysis and visualization of alternative splicing in R