virtualArrayExpressionSets: Combine different ExpressionSets into one

Description Usage Arguments Details Value Author(s) See Also Examples


This function selects all ExpressionSets in the current environment and builds a new single ExpressionSet of the raw data included in the input. This is done by annotating the expression values with the selected identifiers, that are pulled from Bioconductor annotation packages. Then lines targetting the same gene are collapsed by the specified function. In the next step compatible rows of the expression matrices are merged. As a final step batch effects resulting from different platforms or labs can be removed in a supervised or non-supervised mode.


virtualArrayExpressionSets(all_expression_sets=FALSE, identifier = "SYMBOL", covars = "Batch", collapse_fun = median, removeBatcheffect = "EB", sampleinfo = FALSE, parallel="BiocParallel", ...)



Logical or a character vector. If "FALSE", "virtualArray" tries to catch all ExpressionSets in the current environment. If set to a character vector holding names of ExpressionSets, these are used instead of all available ones.


annotation identifier by which the expression values are combined


Character vector of length "1" or longer. Used to define non-supervised (length = "1") or supervised (length > "1") mode. THe default is "Batch". See details for more info.


The function to be used to collapse expression values targetting the same gene/identifier. Defaults to "median".


Logical or character vector. "FALSE" will lead to just a combined ExpressionSet, you will then have to use other functions to remove the batch effects. You can set it to "EB", "GQ", "MRS", QD" ,"NORDI" or "MC" to use empirical Bayes methods, gene quantiles, median rank scores, quantile discretization ,normal discretization or mean centering to remove batch effects, respectively.


This parameter selects in which way the information of the relationships between batches and samples/datasets will be supplied. The default (FALSE) uses a sample_info data.frame that is generated on the fly from the pData slots of the supplied ExpressionSets with an additional "Batch" column. If you run in non-interactive mode, you can specify a data.frame to be used as the input "sample_info". Another option is to hand over a file name so the preconfigured text file can be fed into the procedure. Note that a text file "sample_info.txt" can be created on the fly, so you can set it up manually in this case. You can select this option using "create".


A character string or a logical to select which package to use for parallel processing. Defaults to "BiocParallel", but can be "multicore", "none" or FALSE as well.


Can be used to pass on parameters to underlying functions.


The "covars" argument determines the mode of batch removal. It refers to the columns in the sample_info data.frame which contains information about all ExpressionSets, their samples and relations thereof. The default value "Batch" will use only the different ExpressionSets for batch effect removal, this is referred to as the non-supervised mode. The supervised mode is to be accessed by using a character vector with a length > 1 e.g. c("Batch","celltype"). In this case a column "celltype" must be common to all pData slots of all datasets prior to invoking the package. The default name of the batch column ("Batch") can even be replaced to match another column. The sample_info data.frame is generated on the fly from the pData slots of the supplied ExpressionSets, during this procedure a "Batch" column is generated. All columns are preserved and common ones joined. These can be used as additional covariates during batch effect removal. In this case the sample_info data.frame has to be modified manually to contain more information on the batches in additional columns. Please note, that during computation you will be notified that "sample_info.txt" has been written to your current working directory for you to modify and save it. If you do so, please select "y" to use the additional columns. Also note that you can not provide a covariate that is distributed only in one batch, this way the procedure will fail.


A new ExpressionSet is returned that combines all ExpressionSets from the current environment.


Andreas Heider (2011)

See Also

virtualArray-package, virtualArray.ExpressionSet, virtualArrayCompile, normalize.ExpressionSet.nordi, normalize.ExpressionSet.mrs, normalize.ExpressionSet.qd,


# Due to the flexibility of this function and the time 
# it takes to get meaningful results, please see the 
# vignette for a comprehensive example, governing 
# several modes of usage. Thanks.

virtualArray documentation built on Sept. 12, 2016, 6:10 a.m.