generate_gCMAP_NChannelSet: Generate a perturbation profile library from expression sets...
In gCMAP: Tools for Connectivity Map-like analyses

Description Usage Arguments Value Examples

When provided with a list of ExpressionSet or countDataSet objects, comparisons are made between control and perturbation samples on a set basis. To process RNAseq count data, the suggested Bioconductor package 'DESeq' must be available on the system. For countDataSets, a moderated log2 fold change for each set is calculated after variance-stabilizing transformation of the count data is performed globally across all countDataSets in the list.

generate_gCMAP_NChannelSet(
                           data.list,
                           uids=1:length(data.list),
                           sample.annotation=NULL,
                           platform.annotation="",
                           control_perturb_col="cmap",
                           control="control",
                           perturb="perturbation",
                           limma=TRUE,
                           limma.index=2,
                           big.matrix=NULL,
                           center.z="peak",
                           center.log_fc="none",
                           report.center=FALSE
                           )

`data.list`	List of `ExpressionSet` or CountDataSet objects. Each element includes all array / RNAseq data for a single instance, plus metadata on which samples are perturbation and control.
`uids`	Vector of unique identifiers for the instances in `data.list`
`sample.annotation`	An optional `data.frame` of additional annotation for instances, each row corresponds to one instance, ordered to correspond with the `data.list`. This is not used for the control/perturbation comparisons, instead it is simply attached to the `NChannelSet` for future reference.
`platform.annotation`	The name of the platform as used by the annotation package.
`control_perturb_col`	See `pairwise_compare`.
`control`	See `pairwise_compare`.
`perturb`	See `pairwise_compare`.
`limma`	Use limma package to perform moderated t-tests (Default: TRUE) instead of a standard t-test ?
`limma.index`	Integer specifying the index of the parameter estimate for which we to extract t and other statistics. The default corresponds to a two-class comparison with the standard parameterization. The function assumes that there was no missing data, so that test for all genes were performed on the same sample size.
`big.matrix`	Character string providing the path and filename to store the NChannelSets on disk instead of in memory. If 'NULL' (default), an NChannelSet is returned. If not 'NULL', the bigmemoryExtras package will create (or overwrite !) three binary files for each channel of the NChannelSet at the location provided as 'big.matrix', distinguishing files for the different channes by their suffices. To load the NChannelSet into a different R session, the binary files must be accessible.
`center.z`	One of 'none', 'peak', 'mean', 'median', selecting whether / how to center the z-scores for each experiment. Option 'peak' (default) will center on the peak of the z-score kernel density. Options 'mean' and 'median' will center on their respective values instead.
`center.log_fc`	One of 'none', 'peak', 'mean', 'median', selecting whether / how to center the log2 fold-change distribution for each experiment. Option 'peak' will center on the peak of the z-score kernel density. Options 'mean' and 'median' will center on their respective values instead.
`report.center`	Logical, include the z-score / log2 fold change corrections and the median absolute deviation of the respective distribution about zero in the pData slot of the returned NChannelSet ?

The function returns an NChannelSet with one channel for each of the columns returned by pairwise_compare. This can be worked with directly (e.g, assayData(obj)$z), or specific channels can be converted to regular ExpressionSet objects (e.g.,es <- channel(obj, "z")). In the latter case, one would access z by exprs(es). If 'report.center' is TRUE, the pData slot of the NChannelSet contains columns reporting the shift applied to the z-score and / or log2 fold change columns to center the score distributions on zero and the median absolute deviation of the shifted distribution about zero.

## list of ExpressionSets
data("sample.ExpressionSet") ## from Biobase

es.list <- list( sample.ExpressionSet[,1:4],
                 sample.ExpressionSet[,5:8],
                 sample.ExpressionSet[,9:12])
names(es.list) <- paste( "Instance", 1:3, sep=".")

de <- generate_gCMAP_NChannelSet(
    es.list,
    1:3,
    platform.annotation = annotation(es.list[[1]]),
    control_perturb_col="type",
    control="Control",
    perturb="Case") 

assayDataElementNames(de)
head( assayDataElement(de, "z") ) 

## Not run: 
## processing RNAseq data requires the suggested 'DESeq' 
## Bioconductor package.
require( DESeq )
set.seed( 123 )
## list of CountDataSets
cds.list <- lapply( 1:3, function(n) {
   cds <- makeExampleCountDataSet()
   featureNames(cds) <- paste("gene",1:10000, sep="_")
   cds
})

cde <- generate_gCMAP_NChannelSet(cds.list,
                           uids=1:3,
                           sample.annotation=NULL,
                           platform.annotation="Entrez",
                           control_perturb_col="condition",
                           control="A",
                           perturb="B")

assayDataElementNames(cde)

## End(Not run)