R/tcgaov-data.R

#' Subset of TCGA mRNA Ovarian serous cystadenocarcinoma data
#'
#' A dataset containing a subset of the TCGA mRNA Ovarian serous
#' cystadenocarcinoma data generated using Affymetrix HTHGU133a arrays.
#' Differences in gene expression profiles have led to the identification of
#' robust molecular subtypes of ovarian cancer; these are of biological and
#' clinical importance because they have been shown to correlate with overall
#' survival (Tothill et al., 2008). Improving prediction of survival time based
#' on gene expression signatures can lead to targeted therapeutic interventions
#' (Helland et al., 2011). The proposed ECLUST algorithm was applied to gene
#' expression data from 511 ovarian cancer patients profiled by the Affymetrix
#' Human Genome U133A 2.0 Array. The data were obtained from the TCGA Research
#' Network: http://cancergenome.nih.gov/ and downloaded via the TCGA2STAT R
#' library (Wanet al., 2015). Using the 881 signature genes from Helland et al.
#' (2011) we grouped subjects into two groups based on the results in this
#' paper, to create a “positive control” environmental variable expected to have
#' a strong effect. Specifically, we defined an environment variable in our
#' framework as: E = 0 for subtypes C1 and C2 (n = 253), and E = 1 for subtypes
#' C4 and C5 (n = 258).
#'
#' @format A data.table and data.frame with 511 rows and 886 variables:
#'   \describe{ \item{rn}{unique patient identifier (\code{character})}
#'   \item{subtype}{cancer subtype (1,2,3 or 4) as per Helland et al. 2011
#'   (\code{integer})} \item{E}{binary environment variable for ECLUST method. E
#'   = 0 for subtypes 1 and 2 (n = 253), and E = 1 for subtypes 4 and 5 (n =
#'   258) (\code{numeric})} \item{status}{vital status, 0 = alive, 1 = dead
#'   (\code{numeric})} \item{OS}{overall survival time (\code{numeric})}
#'   \item{columns 6:886}{gene expression data for 881 genes. column names are
#'   the gene names (\code{numeric})} }
#'
#' @source \url{http://www.liuzlab.org/TCGA2STAT/#import-gene-expression}
#' @source \url{http://gdac.broadinstitute.org/}
#' @source
#' \url{http://journals.plos.org/plosone/article/asset?unique&id=info:doi/10.1371/journal.pone.0018064.s015}
#'
#' @references  Richard W Tothill, Anna V Tinker, Joshy George, Robert Brown,
#'   Stephen B Fox, Stephen Lade, Daryl S Johnson, Melanie K Trivett, Dariush
#'   Etemadmoghadam, Bianca Locandro, et al. Novel molecular subtypes of serous
#'   and endometrioid ovarian cancer linked to clinical outcome. Clinical Cancer
#'   Research, 14(16):5198–5208, 2008.
#' @references Aslaug Helland, Michael S Anglesio, Joshy George, Prue A Cowin,
#'   Cameron N Johnstone, Colin M House, Karen E Sheppard, Dariush
#'   Etemadmoghadam, Nataliya Melnyk, Anil K Rustgi, et al. Deregulation of
#'   mycn, lin28b and let7 in a molecular subtype of aggressive high-grade
#'   serous ovarian cancers. PloS one, 6(4):e18064, 2011.
#' @examples
#' # using data.table syntax from the data.table package
#' tcgaov[1:5, 1:10, with = FALSE]
#' tcgaov[,table(subtype, E, useNA = "always")]
"tcgaov"

Try the eclust package in your browser

Any scripts or data that you put into this service are public.

eclust documentation built on May 1, 2019, 8:46 p.m.