simplifyTCGA: Functions to convert rows annotations to ranges and...

View source: R/simplifyTCGA.R

simplifyTCGAR Documentation

Functions to convert rows annotations to ranges and RaggedExperiment to RangedSummarizedExperiment

Description

This group of functions will convert row annotations as either gene symbols or miRNA symbols to row ranges based on database resources 'TxDB' and 'org.Hs' packages. It will also simplify the representation of RaggedExperiment objects to RangedSummarizedExperiment.

Usage

simplifyTCGA(obj, keep.assay = FALSE, unmapped = TRUE)

symbolsToRanges(obj, keep.assay = FALSE, unmapped = TRUE)

mirToRanges(obj, keep.assay = FALSE, unmapped = TRUE)

CpGtoRanges(obj, keep.assay = FALSE, unmapped = TRUE)

qreduceTCGA(obj, keep.assay = FALSE, suffix = "_simplified")

Arguments

obj

A MultiAssayExperiment object obtained from curatedTCGAData

keep.assay

logical (default FALSE) Whether to keep the SummarizedExperiment assays that have been converted to RangedSummarizedExperiment

unmapped

logical (default TRUE) Include an assay of data that was not able to be mapped in reference database

suffix

character (default "_simplified") A character string to append to the newly modified assay for qreduceTCGA.

Details

The original SummarizedExperiment containing either gene symbol or miR annotations is replaced or supplemented by a RangedSummarizedExperiment for those that could be mapped to GRanges, and optionally another SummarizedExperiment for annotations that could not be mapped to GRanges.

Value

A MultiAssayExperiment with any gene expression, miRNA, copy number, and mutations converted to RangedSummarizedExperiment objects

qreduceTCGA

Using TxDb.Hsapiens.UCSC.hg19.knownGene as the reference, qreduceTCGA reduces the data by applying either the weightedmean or nonsilent function (see below) to non-mutation or mutation data, respectively. Internally, it uses RaggedExperiment::qreduceAssay() to reduce the ranges to the gene-level.

qreduceTCGA will update genome(x) based on the NCBI reference annotation which includes the patch number, e.g., GRCh37.p14, as provided by the seqlevelsStyle setter, seqlevelsStyle(gn) <- "NCBI". qreduceTCGA uses the NCBI genome annotation as the default reference.

nonsilent <- function(scores, ranges, qranges)
    any(scores != "Silent")

RaggedExperiment mutation objects become a genes by patients RangedSummarizedExperiment object containing '1' if there is a non-silent mutation somewhere in the gene, and '0' otherwise as obtained from the Variant_Classification column in the data.

weightedmean <- function(scores, ranges, qranges) {
    isects <- GenomicRanges::pintersect(ranges, qranges)
    sum(scores * BiocGenerics::width(isects)) /
        sum(BiocGenerics::width(isects))
}

"CNA" and "CNV" segmented copy number are reduced using a weighted mean in the rare cases of overlapping (non-disjoint) copy number regions.

These functions rely on TxDb.Hsapiens.UCSC.hg19.knownGene and org.Hs.eg.db to map to the 'hg19' NCBI build. Use the liftOver procedure for datasets that are provided against a different reference genome (usually 'hg18'). See an example in the vignette.

Author(s)

L. Waldron

Examples


library(curatedTCGAData)
library(GenomeInfoDb)

accmae <-
    curatedTCGAData(diseaseCode = "ACC",
    assays = c("CNASNP", "Mutation", "miRNASeqGene", "GISTICT"),
    version = "1.1.38",
    dry.run = FALSE)

## update genome annotation
rex <- accmae[["ACC_Mutation-20160128"]]

## Translate build to "hg19"
tgenome <- vapply(genome(rex), translateBuild, character(1L))
genome(rex) <- tgenome

accmae[["ACC_Mutation-20160128"]] <- rex

simplifyTCGA(accmae)


waldronlab/TCGAmisc documentation built on Dec. 19, 2024, 2:10 p.m.