CollateData: Processes data from IRFinder output
In alexchwong/NxtIRFcore: Core Engine for NxtIRF: a User-Friendly Intron Retention and Alternative Splicing Analysis using the IRFinder Engine

CollateData

R Documentation

Processes data from IRFinder output

Description

CollateData unifies a list of IRFinder output files belonging to an experiment.

Usage

CollateData(
  Experiment,
  reference_path,
  output_path,
  IRMode = c("SpliceOverMax", "SpliceMax"),
  overwrite = FALSE,
  n_threads = 1,
  samples_per_block = 16
)

Arguments

`Experiment`	(Required) A 2 or 3 column data frame, ideally generated by Find_IRFinder_Output or Find_Samples. The first column designate the sample names, and the 2nd column contains the path to the IRFinder output file (of type `sample.txt.gz`). (Optionally) a 3rd column contains the coverage files (of type `sample.cov`) of the corresponding samples. NB: all other columns are ignored.
`reference_path`	(Required) The path to the reference generated by BuildReference
`output_path`	(Required) The path to contain the output files for this function
`IRMode`	(default `SpliceOverMax`) The algorithm to calculate 'splice abundance' in IR quantification. Valid options are `SpliceOverMax` and `SpliceMax`. See details
`overwrite`	(default `FALSE`) If `CollateData()` has previously been run using the same set of samples, it will not be overwritten unless this is set to `TRUE`.
`n_threads`	(default `1`) The number of threads to use. On low memory systems, reduce the number of `n_threads` and `samples_per_block`
`samples_per_block`	(default `16`) How many samples to process per thread, maximum. Setting this to a lower value may help in memory-constrained systems.

Details

All sample IRFinder outputs must be generated using the same reference.

The combination of junction counts and IR quantification from IRFinder is used to calculate percentage spliced in (PSI) of alternative splice events, and percent intron retention (PIR) of retained introns. Also, QC information is extracted. Data is organised in a H5file and FST files for memory and processor efficient downstream access using MakeSE.

The original IRFinder algorithm, see the following wiki, uses SpliceMax to estimate abundance of spliced transcripts. This calculates the number of mapped splice events that share the boundary coordinate of either the left or right flanking exon SpliceLeft,SpliceRight, estimating splice abundance as the larger of the two values.

NxtIRF proposes a new algorithm,SpliceOverMax, to account for the possibility that the major isoform shares neither boundary, but arises from either of the flanking "exon islands". Exon islands are contiguous regions covered by exons from any transcript (except those designated as retained_intron or sense_intronic), and are separated by obligate intronic regions (genomic regions that are introns for all transcripts). For introns that are internal to a single exon island (i.e. akin to "known-exon" introns from IRFinder), SpliceOverMax uses GenomicRanges::findOverlaps to sum all splice reads that overlap the same genomic region as the intron of interest.

Value

CollateData() writes to the directory given by output_path. This output directory is portable (i.e. it can be moved to a different location after running CollateData() before running MakeSE), but individual files within the output folder should not be moved.

Also, the IRFinder and CollateData output folders should be copied to the same destination and their relative paths preserved. Otherwise, the locations of the "COV" files will not be recorded in the collated data and will have to be re-assigned using covfile(se)<-. See MakeSE

Examples

BuildReference(
    reference_path = file.path(tempdir(), "Reference"),
    fasta = chrZ_genome(),
    gtf = chrZ_gtf()
)

bams <- NxtIRF_example_bams()
IRFinder(bams$path, bams$sample,
  reference_path = file.path(tempdir(), "Reference"),
  output_path = file.path(tempdir(), "IRFinder_output")
)

expr <- Find_IRFinder_Output(file.path(tempdir(), "IRFinder_output"))
CollateData(expr,
  reference_path = file.path(tempdir(), "Reference"),
  output_path = file.path(tempdir(), "NxtIRF_output")
)

alexchwong/NxtIRFcore documentation built on Oct. 31, 2022, 9:14 a.m.

alexchwong/NxtIRFcore index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

alexchwong/NxtIRFcore
Core Engine for NxtIRF: a User-Friendly Intron Retention and Alternative Splicing Analysis using the IRFinder Engine

CollateData: Processes data from IRFinder output
In alexchwong/NxtIRFcore: Core Engine for NxtIRF: a User-Friendly Intron Retention and Alternative Splicing Analysis using the IRFinder Engine

Processes data from IRFinder output

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to CollateData in alexchwong/NxtIRFcore...

R Package Documentation

Browse R Packages

We want your feedback!

alexchwong/NxtIRFcore Core Engine for NxtIRF: a User-Friendly Intron Retention and Alternative Splicing Analysis using the IRFinder Engine

CollateData: Processes data from IRFinder output In alexchwong/NxtIRFcore: Core Engine for NxtIRF: a User-Friendly Intron Retention and Alternative Splicing Analysis using the IRFinder Engine

Processes data from IRFinder output

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to CollateData in alexchwong/NxtIRFcore...

R Package Documentation

Browse R Packages

We want your feedback!

alexchwong/NxtIRFcore
Core Engine for NxtIRF: a User-Friendly Intron Retention and Alternative Splicing Analysis using the IRFinder Engine

CollateData: Processes data from IRFinder output
In alexchwong/NxtIRFcore: Core Engine for NxtIRF: a User-Friendly Intron Retention and Alternative Splicing Analysis using the IRFinder Engine