all | R Documentation |
Unprocessed data set (no batch correction performed) formed from the combination of 2 independently-derived ALL data sets, Ross et al. and Yeoh et al.
data(all)
A data frame with 75 samples (columns) and 8813 ENSEMBL ID variables (rows):
Both data sets were combined into a single data set with the following procedure:
All probes of both data sets were converted to ENSEMBL IDs using biomaRt.
To ensure a one-to-one mapping between the probes and ENSEMBL IDs in both data sets, all probes with no ENSEMBL ID were removed. Probes with multiple ENSEMBL IDs were replaced by the ENSEMBL ID with the smallest value (ENSEMBL IDs were ordered using the default order function and all ENSEMBL IDs after the first ENSEMBL ID was removed). We took the median values of probes sharing the same ENSEMBL ID. After this procedure, both data sets would consist of unique ENSEMBL ID variables.
To join both data sets without any null values or data imputation (since both data sets may not have the same number and type of ENSEMBL IDs), we took the intersection of ENSEMBL IDs between both data sets. This set of ENSEMBL IDs would be the ENSEMBL IDs of the joined data set.
Both data sets were joined along the shared set of ENSEMBL IDs.
Ross ME, Mahfouz R, Onciu M, Liu H-C, Zhou X, Song G, et al. Gene expression profiling of pediatric acute myelogenous leukemia Blood. 2004; 104:3679-87.
Yeoh E-J, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling Cancer Cell. 2002; 1:133-43.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.