simplifyTCGA | R Documentation |
This group of functions will convert row annotations as either gene symbols or miRNA symbols to row ranges based on database resources 'TxDB' and 'org.Hs' packages. It will also simplify the representation of RaggedExperiment objects to RangedSummarizedExperiment.
simplifyTCGA(obj, keep.assay = FALSE, unmapped = TRUE)
symbolsToRanges(obj, keep.assay = FALSE, unmapped = TRUE)
mirToRanges(obj, keep.assay = FALSE, unmapped = TRUE)
CpGtoRanges(obj, keep.assay = FALSE, unmapped = TRUE)
qreduceTCGA(obj, keep.assay = FALSE, suffix = "_simplified")
obj |
A |
keep.assay |
logical (default FALSE) Whether to keep the
|
unmapped |
logical (default TRUE) Include an assay of data that was not able to be mapped in reference database |
suffix |
character (default "_simplified") A character string to append
to the newly modified assay for |
The original SummarizedExperiment
containing either gene symbol
or miR annotations is replaced or supplemented by a
RangedSummarizedExperiment for those that could be mapped to
GRanges, and optionally another
SummarizedExperiment for annotations that
could not be mapped to GRanges.
A
MultiAssayExperiment
with any gene expression, miRNA, copy number, and mutations converted to
RangedSummarizedExperiment
objects
Using TxDb.Hsapiens.UCSC.hg19.knownGene
as the reference, qreduceTCGA
reduces the data by applying either the weightedmean
or nonsilent
function (see below) to non-mutation or mutation data, respectively.
Internally, it uses RaggedExperiment::qreduceAssay()
to reduce the ranges
to the gene-level.
qreduceTCGA
will update genome(x)
based on the NCBI reference annotation
which includes the patch number, e.g., GRCh37.p14, as provided by the
seqlevelsStyle
setter, seqlevelsStyle(gn) <- "NCBI"
. qreduceTCGA
uses the NCBI genome annotation as the default reference.
nonsilent <- function(scores, ranges, qranges) any(scores != "Silent")
RaggedExperiment
mutation objects become a genes by patients
RangedSummarizedExperiment
object containing '1' if there is a non-silent
mutation somewhere in the gene, and '0' otherwise as obtained from the
Variant_Classification
column in the data.
weightedmean <- function(scores, ranges, qranges) { isects <- GenomicRanges::pintersect(ranges, qranges) sum(scores * BiocGenerics::width(isects)) / sum(BiocGenerics::width(isects)) }
"CNA" and "CNV" segmented copy number are reduced using a weighted mean in the rare cases of overlapping (non-disjoint) copy number regions.
These functions rely on TxDb.Hsapiens.UCSC.hg19.knownGene
and
org.Hs.eg.db
to map to the 'hg19' NCBI build. Use the liftOver
procedure
for datasets that are provided against a different reference genome (usually
'hg18'). See an example in the vignette.
L. Waldron
library(curatedTCGAData)
library(GenomeInfoDb)
accmae <-
curatedTCGAData(diseaseCode = "ACC",
assays = c("CNASNP", "Mutation", "miRNASeqGene", "GISTICT"),
version = "1.1.38",
dry.run = FALSE)
## update genome annotation
rex <- accmae[["ACC_Mutation-20160128"]]
## Translate build to "hg19"
tgenome <- vapply(genome(rex), translateBuild, character(1L))
genome(rex) <- tgenome
accmae[["ACC_Mutation-20160128"]] <- rex
simplifyTCGA(accmae)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.