R/data.R

#' TCGA breast invasive carcinoma (BRCA) gene-level nonsilent somatic mutation (wustl curated)
#'
#' @description
#' TCGA breast invasive carcinoma (BRCA) somatic mutation data. Sequencing data
#' are generated on a IlluminaGA system. The calls are generated at Genome
#' Institute at Washington University Sequencing Center using the WashU pipeline
#' method.
#'
#' @details
#' "1" indicates that a non-silent somatic mutation (nonsense, missense,
#' frame-shift indels, splice site mutations, stop codon readthroughs, change of
#' start codon, inframe indels) was identified in the protein coding region of a
#' gene, or any mutation identified in a non-coding gene. "0" indicates that
#' none of the above mutation calls were made in this gene for the specific
#' sample.
#'
#' @format An untidy dataframe with genes as rows and samples as columns.
#'
#' @source \url{https://xenabrowser.net/datapages/?dataset=TCGA.BRCA.sampleMap%2Fmutation_curated_wustl_gene&host=https%3A%2F%2Ftcga.xenahubs.net}
"mutations"

#' TCGA breast invasive carcinoma (BRCA) copy number gistic2 thresholded estimate
#'
#' @description
#' TCGA breast invasive carcinoma (BRCA) thresholded gene-level copy number
#' variation (CNV) estimated using the GISTIC2 method.
#'
#' @details
#' Copy number profile was measured experimentally using whole genome microarray
#' at a TCGA genome characterization center. Subsequently, GISTIC2 method was
#' applied using the TCGA FIREHOSE pipeline to produce gene-level copy number
#' estimates. GISTIC2 further thresholded the estimated values to -2,-1,0,1,2,
#' representing homozygous deletion, single copy deletion, diploid normal copy,
#' low-level copy number amplification, or high-level copy number amplification.
#' Genes are mapped onto the human genome coordinates using UCSC xena HUGO
#' probeMap. Reference to GISTIC2 method PMID:21527027.
#'
#' @format An untidy dataframe with genes as rows and samples as columns (25,777
#'   identifiers X 1080 samples).
#'
#' @source \url{https://xenabrowser.net/datapages/?dataset=TCGA.BRCA.sampleMap%2FGistic2_CopyNumber_Gistic2_all_thresholded.by_genes&host=https%3A%2F%2Ftcga.xenahubs.net}
"cnvs"

#' TCGA Breast Cancer (BRCA) Clinical Matrix
#'
#' @description
#' TCGA breast invasive carcinoma (BRCA) clinical matrix holding phenotypic data
#' for samples.
#'
#' @format A dataframe with 1247 samples and 203 variables.
#'
#'@source \url{https://xenabrowser.net/datapages/?dataset=TCGA.BRCA.sampleMap%2FBRCA_clinicalMatrix&host=https%3A%2F%2Ftcga.xenahubs.net}
"phenotypes"

#' Common samples across datasets
#'
#' @description
#' A vector of sample IDs that are found common in all three datasets:
#' \code{\link{mutations}}, \code{\link{cnvs}}, and \code{\link{phenotypes}}.
#'
#' @format A character vector of length length 963.
"common_samples"
shunsambongi/sambcdata documentation built on May 24, 2019, 5:05 a.m.