leuk: Leukemia Microarry Data Set

leukR Documentation

Leukemia Microarry Data Set

Description

Unprocessed data set (no batch correction performed) formed from the combination of 2 independently-derived leukemia data sets, Golub et al. and Armstrong et al.

Usage

data(leuk)

Format

A data frame with 120 samples (columns) and 5145 ENSEMBL ID variables (rows):

Details

Both data sets were combined into a single data set with the following procedure:

  1. All probes of Armstrong et al. were converted to ENSEMBL IDs using biomaRt and while all probes of Golub et al. were converted to ENSEMBL IDs using hu6800.db.

  2. To ensure a one-to-one mapping between the probes and ENSEMBL IDs in both data sets, all probes with no ENSEMBL ID were removed. Probes with multiple ENSEMBL IDs were replaced by the ENSEMBL ID with the smallest value (ENSEMBL IDs were ordered using the default order function and all ENSEMBL IDs after the first ENSEMBL ID was removed). We took the median values of probes sharing the same ENSEMBL ID. After this procedure, both data sets would consist of unique ENSEMBL ID variables.

  3. To join both data sets without any null values or data imputation (since both data sets may not have the same number and type of ENSEMBL IDs), we took the intersection of ENSEMBL IDs between both data sets. This set of ENSEMBL IDs would be the ENSEMBL IDs of the joined data set.

  4. Both data sets were joined along the shared set of ENSEMBL IDs.

References

Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring Science. 1999; 286:531-7.

Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia Nat Genet. 2002; 30:41-7.


lr98769/doppelgangerIdentifier documentation built on Aug. 2, 2022, 9:41 a.m.