nci60: NCI-60 cancer cell panel
In aalfons/robustHD: Robust Methods for High-Dimensional Data

nci60

R Documentation

NCI-60 cancer cell panel

Description

The data set is a pre-processed version of the NCI-60 cancer cell panel as used in Alfons, Croux & Gelper (2013). One observation was removed since all values in the gene expression data were missing.

Usage

data("nci60")

Format

Protein and gene expression data on 59 observations are stored in two separate matrices:

protein: a matrix containing protein expressions based on antibodies (162 columns), acquired via reverse-phase protein lysate arrays and log2 transformed.
gene: a matrix containing gene expression data (22283 columns), obtained with an Affymetrix HG-U133A chip and normalized with the GCRMA method.

In addition, meta information on the proteins, genes, and cancer cell lines is stored in three separate data frames:

proteinInfo: a data frame with 162 rows and the following 4 columns: Experiment (the name of the experiment for collecting the data), Probe (the name of the individual probe), Symbol (the symbol of the protein in Human Genome Organisation (HUGO) nomenclature), and ID (identifier of the protein per the National Center for Biotechnology Information (NCBI) Entrez database). The rows of this data frame correspond to the columns of the matrix protein.
geneInfo: a data frame with 22283 rows and the following 4 columns: Experiment (the name of the experiment for collecting the data), Probe (the name of the individual probe), Symbol (the symbol of the gene in Human Genome Organisation (HUGO) nomenclature), and ID (identifier of the gene per the National Center for Biotechnology Information (NCBI) Entrez database). The rows of this data frame correspond to the columns of the matrix gene.
cellLineInfo: a data frame with 59 rows and 15 columns containing various information on the cancer cell lines, such as tissue of origin and histology, or age and sex of the patient. The rows of this data frame correspond to the rows of the matrices protein and gene.

Source

The original data were downloaded from https://discover.nci.nih.gov/cellminer/ on 2012-01-27.

The exact version of the data used in Alfons, Croux & Gelper (2013) can be obtained from https://github.com/aalfons/nci60, together with the script for pre-processing. The data in package robustHD differ in that the matrix of the gene expressions is called gene and that they include the three data frames with meta information on proteins, genes, and cancer cell lines.

References

Reinhold, W.C., Sunshine, M., Liu, H., Varma, S., Kohn, K.W., Morris, J., Doroshow, J. and Pommier, Y. (2012) CellMiner: A Web-Based Suite of Genomic and Pharmacologic Tools to Explore Transcript and Drug Patterns in the NCI-60 Cell Line Set. Cancer Research, 72(14), 3499–3511. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1158/0008-5472.CAN-12-1370")}

Alfons, A., Croux, C. and Gelper, S. (2013) Sparse least trimmed squares regression for analyzing high-dimensional large data sets. The Annals of Applied Statistics, 7(1), 226–248. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/12-AOAS575")}

Examples


# load data
data("nci60")
# define response variable
y <- protein[, 92]
# screen most correlated predictor variables
correlations <- apply(gene, 2, corHuber, y)
keep <- partialOrder(abs(correlations), 100, decreasing = TRUE)
X <- gene[, keep]

aalfons/robustHD documentation built on July 3, 2024, 9:15 a.m.