CollateData | R Documentation |
CollateData unifies a list of IRFinder output files belonging to an experiment.
CollateData( Experiment, reference_path, output_path, IRMode = c("SpliceOverMax", "SpliceMax"), overwrite = FALSE, n_threads = 1, samples_per_block = 16 )
Experiment |
(Required) A 2 or 3 column data frame, ideally generated by
Find_IRFinder_Output or Find_Samples.
The first column designate the sample names, and the 2nd column
contains the path to the IRFinder output file (of type |
reference_path |
(Required) The path to the reference generated by BuildReference |
output_path |
(Required) The path to contain the output files for this function |
IRMode |
(default |
overwrite |
(default |
n_threads |
(default |
samples_per_block |
(default |
All sample IRFinder outputs must be generated using the same reference.
The combination of junction counts and IR quantification from IRFinder is used to calculate percentage spliced in (PSI) of alternative splice events, and percent intron retention (PIR) of retained introns. Also, QC information is extracted. Data is organised in a H5file and FST files for memory and processor efficient downstream access using MakeSE.
The original IRFinder algorithm, see the following
wiki,
uses SpliceMax
to estimate abundance of spliced transcripts.
This calculates the number of mapped splice events
that share the boundary coordinate of either the left or right flanking
exon SpliceLeft,SpliceRight
, estimating splice abundance as the larger
of the two values.
NxtIRF proposes a new algorithm,SpliceOverMax
,
to account for the possibility that the major isoform shares neither
boundary, but arises from either of the flanking "exon islands". Exon
islands are contiguous regions covered by exons from any transcript
(except those designated as retained_intron
or
sense_intronic
), and are separated by
obligate intronic regions (genomic regions that are introns for all
transcripts). For introns that are internal to a single exon island
(i.e. akin to "known-exon" introns from IRFinder), SpliceOverMax
uses GenomicRanges::findOverlaps to sum all splice reads that overlap
the same genomic region as the intron of interest.
CollateData()
writes to the directory given by output_path
.
This output directory is portable (i.e. it can be moved to a different
location after running CollateData()
before running MakeSE), but
individual files within the output folder should not be moved.
Also, the IRFinder and CollateData output folders should be copied to
the same destination and their relative paths preserved. Otherwise, the
locations of the "COV" files will not be recorded in the collated data and
will have to be re-assigned using covfile(se)<-
. See MakeSE
IRFinder, MakeSE
BuildReference( reference_path = file.path(tempdir(), "Reference"), fasta = chrZ_genome(), gtf = chrZ_gtf() ) bams <- NxtIRF_example_bams() IRFinder(bams$path, bams$sample, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "IRFinder_output") ) expr <- Find_IRFinder_Output(file.path(tempdir(), "IRFinder_output")) CollateData(expr, reference_path = file.path(tempdir(), "Reference"), output_path = file.path(tempdir(), "NxtIRF_output") )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.