The r BiocStyle::Biocpkg("RaggedExperiment") package provides a flexible data
representation for copy number, mutation and other ragged array schema for
genomic location data. It aims to provide a framework for a set of samples
that have differing numbers of genomic ranges.
The RaggedExperiment class derives from a GRangesList representation and
provides a semblance of a rectangular dataset. The row and column dimensions of
the RaggedExperiment correspond to the number of ranges in the entire dataset
and the number of samples represented in the data, respectively.
if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install("RaggedExperiment")
Loading the package:
library(RaggedExperiment) library(GenomicRanges)
RaggedExperiment class overviewA schematic showing the class geometry and supported transformations of the
RaggedExperiment class is show below. There are three main operations for
transforming the RaggedExperiment representation:
1) sparseAssay
2) compactAssay
3) qreduceAssay
knitr::include_graphics("RaggedExperiment.svg")
RaggedExperiment objectWe start with a couple of GRanges objects, each representing an individual
sample:
sample1 <- GRanges( c(A = "chr1:1-10:-", B = "chr1:8-14:+", C = "chr2:15-18:+"), score = 3:5) sample2 <- GRanges( c(D = "chr1:1-10:-", E = "chr2:11-18:+"), score = 1:2)
Include column data colData to describe the samples:
colDat <- DataFrame(id = 1:2)
GRanges objectsragexp <- RaggedExperiment(sample1 = sample1, sample2 = sample2, colData = colDat) ragexp
GRangesList instancegrl <- GRangesList(sample1 = sample1, sample2 = sample2) RaggedExperiment(grl, colData = colDat)
list of GRangesrangeList <- list(sample1 = sample1, sample2 = sample2) RaggedExperiment(rangeList, colData = colDat)
List of GRanges with metadataNote: In cases where a SimpleGenomicRangesList is provided along with
accompanying metadata (accessed by mcols), the metadata is used as the
colData for the RaggedExperiment.
grList <- List(sample1 = sample1, sample2 = sample2) mcols(grList) <- colDat RaggedExperiment(grList)
rowRanges(ragexp)
dimnames(ragexp)
colDatacolData(ragexp)
Subsetting a RaggedExperiment is akin to subsetting a matrix object. Rows
correspond to genomic ranges and columns to samples or specimen. It is possible
to subset using integer, character, and logical indices.
The overlapsAny and subsetByOverlaps functionalities are available for use
for RaggedExperiment. Please see the corresponding documentation in
RaggedExperiment and GenomicRanges.
RaggedExperiment package provides several different functions for representing
ranged data in a rectangular matrix via the *Assay methods.
The most straightforward matrix representation of a RaggedExperiment will
return a matrix of dimensions equal to the product of the number of ranges and
samples.
dim(ragexp) Reduce(`*`, dim(ragexp)) sparseAssay(ragexp) length(sparseAssay(ragexp))
Samples with identical ranges are placed in the same row. Non-disjoint ranges are not collapsed.
compactAssay(ragexp)
This function returns a matrix of disjoint ranges across all samples. Elements
of the matrix are summarized by applying the simplifyDisjoin functional
argument to assay values of overlapping ranges.
disjoinAssay(ragexp, simplifyDisjoin = mean)
The qreduceAssay function works with a query parameter that highlights
a window of ranges for the resulting matrix. The returned matrix will have
dimensions length(query) by ncol(x). Elements contain assay values for the
i th query range and the j th sample, summarized according to the
simplifyReduce functional argument.
For demonstration purposes, we can have a look at the original GRangesList
and the associated scores from which the current ragexp object is derived:
unlist(grl, use.names = FALSE)
This data is represented as rowRanges and assays in RaggedExperiment:
rowRanges(ragexp) assay(ragexp, "score")
Here we provide the "query" region of interest:
(query <- GRanges(c("chr1:1-14:-", "chr2:11-18:+")))
The simplifyReduce argument in qreduceAssay allows the user to summarize
overlapping regions with a custom method for the given "query" region of
interest. We provide one for calculating a weighted average score per
query range, where the weight is proportional to the overlap widths between
overlapping ranges and a query range.
Note that there are three arguments to this function. See the documentation for additional details.
weightedmean <- function(scores, ranges, qranges) { isects <- pintersect(ranges, qranges) sum(scores * width(isects)) / sum(width(isects)) }
A call to qreduceAssay involves the RaggedExperiment, the GRanges query
and the simplifyReduce functional argument.
qreduceAssay(ragexp, query, simplifyReduce = weightedmean)
See the schematic for a visual representation.
The RaggedExperiment provides a family of parallel functions for coercing to
the SummarizedExperiment class. By selecting a particular assay index (i), a
parallel assay coercion method can be achieved.
Here is the list of function names:
sparseSummarizedExperimentcompactSummarizedExperimentdisjoinSummarizedExperimentqreduceSummarizedExperimentSee the documentation for details.
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.