loadBreastEsets: Function to load breast cancer expression sets from the...
In bhklab/MetaGxBreast: Transcriptomic Breast Cancer Datasets

Description Usage Arguments Value Examples

This function returns breast cancer datasets from the hub and a vector of patients from the datasets that are most likely duplicates

loadBreastEsets(
  loadString = "majority",
  removeDuplicates = TRUE,
  quantileCutoff = 0,
  rescale = FALSE,
  minNumberGenes = 0,
  minNumberEvents = 0,
  minSampleSize = 0,
  removeRetracted = TRUE,
  removeSubsets = TRUE,
  keepCommonOnly = FALSE,
  imputeMissing = FALSE
)

`loadString`	a character vector specifying which data will be loaded. The default is "majority", which loads in 37 of the 39 datasets. The other option is to provide a character vecotr of the names of the datasets to load. The metabric and tcga datasets areloaded separately as they are very large and doing so will help prevent memory allocation errors for R windows. Furthermore, these datasets are so large that they dominate statistical analyses so it is best that they are analyzed separate of the 37 smaller datasets loaded with the string majority
`removeDuplicates`	remove patients with a Spearman correlation greater than or equal to 0.98 with other patient expression profiles (default TRUE)
`quantileCutoff`	A nueric between 0 and 1 specifying to remove genes with standard deviation below the required quantile (default 0)
`rescale`	apply centering and scaling to the expression sets (default FALSE)
`minNumberGenes`	an integer specifying to remove expression sets with less genes than this number (default 0)
`minNumberEvents`	an integer specifying how man survival events must be in the dataset to keep the dataset (default 0)
`minSampleSize`	an integer specifying the minimum number of patients required in an eset (default 0)
`removeRetracted`	remove datasets from retracted papers (default TRUE, currently just PMID17290060 dataset)
`removeSubsets`	remove datasets that are a subset of other datasets (defeault TRUE, currently just PMID19318476)
`keepCommonOnly`	remove probes not common to all datasets (default FALSE)
`imputeMissing`	remove patients from datasets with missing expression values

a list with 2 elements. The First element named esets contains the datasets. The second element named duplicates contains a vector with patient IDs for the duplicate patients (those with Spearman correlation greater than or equal to 0.98 with other patient expression profiles).

1
2
3

## Use the default loadString="majority" if you want the 37 smaller datasets
esetsAndDups <- loadBreastEsets(loadString = c("CAL", "DFHCC", "DFHCC2",
    "DFHCC3", "DUKE", "DUKE2", "EMC2"))