curatedTCGAData: Create a MultiAssayExperiment from specific assays and...

View source: R/curatedTCGAData.R

curatedTCGADataR Documentation

Create a MultiAssayExperiment from specific assays and cohorts

Description

curatedTCGAData assembles data on-the-fly from ExperimentHub to provide cohesive MultiAssayExperiment container objects. All the user has to do is to provide TCGA disease code(s) and assay types. It is highly recommended to use the companion package TCGAutils, developed to work with TCGA data specifically from curatedTCGAData and some flat files.

Usage

curatedTCGAData(
  diseaseCode = "*",
  assays = "*",
  version,
  dry.run = TRUE,
  verbose = TRUE,
  ...
)

Arguments

diseaseCode

character() A vector of TCGA cancer cohort codes (e.g., COAD)

assays

character() A vector of TCGA assays, glob matches allowed; see below for more details

version

character(1) One of ⁠1.1.38⁠, ⁠2.0.1⁠, ⁠2.1.0⁠, or ⁠2.1.1⁠ indicating the data version to obtain from ExperimentHub. Version ⁠2.1.1⁠ includes various improvements as well as the addition of the RNASeq2Gene assay and subtype updates. See version section details.

dry.run

logical(1) Whether to return the dataset names before actual download (default TRUE)

verbose

logical(1) Whether to show the dataset currenlty being (down)loaded (default TRUE)

...

Additional arguments passed on to the ExperimentHub constructor

Details

This function will check against available resources in ExperimentHub. Only the latest runDate ("2016-01-28") is supported. Use the dry.run = FALSE to download remote datasets and build an integrative MultiAssayExperiment object. For a list of 'diseaseCodes', see the curatedTCGAData-package help page.

Value

a MultiAssayExperiment of the specified assays and cancer codes or informative data.frame of resources when dry.run is TRUE

Available Assays

Below is a list of partial ExperimentList assay names and their respective description. These assays can be entered as part of the assays argument in the main function. Partial glob matches are allowed such as: 'CN*' for "CNASeq", "CNASNP", "CNVSNP" assays. Credit: Ludwig G.


ExperimentList data types   Description
----------------------------------------------------------------------------
SummarizedExperiment*
  RNASeqGene                Gene expression values
  RNASeq2Gene               RSEM TPM gene expression values
  RNASeq2GeneNorm           Upper quartile log2 normalized RSEM TPM gene
                            expression values
  miRNAArray                Probe-level  miRNA expression values
  miRNASeqGene              Gene-level log2 RPM miRNA expression values
  mRNAArray                 Unified gene-level mRNA expression values
  mRNAArray_huex            Gene-level mRNA expression values from Affymetrix
                            Human Exon Array
  mRNAArray_TX_g4502a       Gene-level mRNA expression values from Agilent
                            244K Array
  mRNAArray_TX_ht_hg_u133a  Gene-level mRNA expression values from Affymetrix
                            Human Genome U133 Array
  GISTIC_AllByGene          Gene-level GISTIC2 copy number values
  GISTIC_ThresholdedByGene  Gene-level GISTIC2 thresholded discrete copy
                            number values
  RPPAArray                 Reverse Phase Protein Array normalized protein
                            expression values
RangedSummarizedExperiment
  GISTIC_Peaks              GISTIC2 thresholded discrete copy number values
                            in recurrent peak regions
SummarizedExperiment with HDF5Array DelayedMatrix
  Methylation_methyl27      Probe-level methylation beta values from Illumina
                            HumanMethylation 27K BeadChip
  Methylation_methyl450     Probe-level methylation beta values from Infinium
                            HumanMethylation 450K BeadChip
RaggedExperiment
  CNASNP                    Segmented somatic Copy Number Alteration calls
                            from SNP array
  CNVSNP                    Segmented germline Copy Number Variant calls from
                            SNP Array
  CNASeq                    Segmented somatic Copy Number Alteration calls
                            from low pass DNA Sequencing
  Mutation*                 Somatic mutations calls
  CNACGH_CGH_hg_244a        Segmented somatic Copy Number Alteration calls
                            from CGH Agilent Microarray 244A
  CNACGH_CGH_hg_415k_g4124a Segmented somatic Copy Number Alteration calls
                            from CGH Agilent Microarray 415K
* All can be converted to RangedSummarizedExperiment (except RPPAArray) with
TCGAutils

version

Version ⁠2.1.1⁠ provides a couple of corrections to the colData for ovarian cancer (OV) and skin cancer (SKCM). In these new data, the cancer subtype variables are fully available. One get obtain the mapping of columns to subtypes in the colData with the getSubtypeMap function in TCGAutils.

Version ⁠2.1.0⁠ provides gene-level log2 RPM miRNA expression values for miRNASeqGene data log2 normalized RSEM for RNASeq2GeneNorm assays. Previously, the data provided were read counts and normalized counts, respectively. See issue #53 on GitHub for additional details.

The new version ⁠2.0.1⁠ includes various improvements including an additional assay that provides RNASeq2Gene data as RSEM TPM gene expression values (issue #38). Additional changes include genomic information for RaggedExperiment type data objects where '37' is now 'GRCh37' as reported in issue #40. Datasets (e.g., OV, GBM) that contain multiple assays that could be merged are now provided as merged assays (issue #27). We corrected an issue where mRNAArray assays were returning DataFrames instead of matrix type data (issue #31). Version ⁠1.1.38⁠ provides the original run of curatedTCGAData and is provided due to legacy reasons.

See Also

curatedTCGAData-package

Examples


curatedTCGAData(
    diseaseCode = c("GBM", "ACC"), assays = "CNASNP", version = "2.0.1"
)

curatedTCGAData("BRCA", "GISTIC*", "2.0.1")


waldronlab/curatedTCGAData documentation built on Feb. 7, 2024, 1:12 p.m.