MakeSE: Constructs a NxtSE object from the collated data

View source: R/MakeSE.R

MakeSER Documentation

Constructs a NxtSE object from the collated data

Description

Creates a NxtSE object from the data from IRFinder output collated using CollateData. This object is used for downstream differential analysis of IR and alternative splicing events using ASE-methods as well as visualisation using Plot_Coverage

Usage

MakeSE(collate_path, colData, RemoveOverlapping = TRUE, realize = FALSE)

Arguments

collate_path

(Required) The output path of CollateData pointing to the collated data

colData

(Optional) A data frame containing the sample annotation information. The first column must contain the sample names. Omit colData to generate a NxtSE object of the whole dataset without any assigned annotations. Alternatively, if the names of only a subset of samples are given, then MakeSE() will construct the NxtSE object based only on the samples given. The colData can be set later using colData()

RemoveOverlapping

(default = TRUE) Whether to filter out overlapping novel IR events belonging to minor isoforms. See details.

realize

(default = FALSE) Whether to load all assay data into memory. See details

Details

MakeSE retrieves the generic SummarizedExperiment structure saved by CollateData, and initialises a NxtSE object. It references the required on-disk assay data using DelayedArrays, thereby utilising 'on-disk' memory to conserve memory usage.

For extremely large datasets, loading the entire data into memory may consume too much memory. In such cases, make a subset of the NxtSE object (e.g. subset by samples) before loading the data into memory (RAM) using realize_NxtSE

It should be noted that downstream applications of NxtIRF, including ASE-methods, Plot_Coverage, are much faster if the NxtSE is realized. It is recommended to realize the NxtSE object before extensive usage.

If COV files assigned via CollateData have been moved relative to the collate_path, the created NxtSE object will not have any linked COV files and Plot_Coverage cannot be used. To reassign these files, a vector of file paths corresponding to all the COV files of the data set can be assigned using covfile(se) <- vector_of_cov_files. See example below for details.

If RemoveOverlapping = TRUE, MakeSE will try to identify which introns belong to major isoforms, then remove introns of minor introns that overlaps those of major isoforms. Non-overlapping introns are then reassessed iteratively, until all introns are included or excluded in this way. This is important to ensure that overlapping novel IR events are not 'double-counted'.

Value

A NxtSE object containing the compiled data in DelayedArrays pointing to the assay data contained in the given collate_path

Examples


# The following code can be used to reproduce the NxtSE object
# that can be fetched with NxtIRF_example_NxtSE()

BuildReference(
    reference_path = file.path(tempdir(), "Reference"),
    fasta = chrZ_genome(),
    gtf = chrZ_gtf()
)

bams <- NxtIRF_example_bams()
IRFinder(bams$path, bams$sample,
  reference_path = file.path(tempdir(), "Reference"),
  output_path = file.path(tempdir(), "IRFinder_output")
)

expr <- Find_IRFinder_Output(file.path(tempdir(), "IRFinder_output"))
CollateData(expr,
  reference_path = file.path(tempdir(), "Reference"),
  output_path = file.path(tempdir(), "NxtIRF_output")
)

se <- MakeSE(collate_path = file.path(tempdir(), "NxtIRF_output"))

# "Realize" NxtSE object to load all H5 assays into memory:

se <- realize_NxtSE(se)

# If COV files have been removed since the last call to CollateData()
# reassign them to the NxtSE object, for example:

covfile_path <- system.file("extdata", package = "NxtIRFcore")
covfile_df <- Find_Samples(covfile_path, ".cov")

covfile(se) <- covfile_df$path

# Check that the produced object is identical to the example NxtSE

example_se <- NxtIRF_example_NxtSE()
identical(se, example_se) # should return TRUE

alexchwong/NxtIRFcore documentation built on Oct. 31, 2022, 9:14 a.m.