prepareData: Combining Two Studies into an Expression Set

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/prepareData.R

Description

The function prepares a collection of two expression sets (ExpressionSet) and/or Affy batches (AffyBatch) to be passed on to the main function OrderedList. For each data set, one has to specify the variable in the corresponding phenodata from which the grouping into two distinct classes is done. The data sets are then merged into one ExpressionSet together with the rearranged phenodata. If the studies were done on different platforms but a subset of genes can be mapped from one chip to the other, this information can be provided via the mapping argument.

Please note that both data sets have to be pre-processed beforehand, either together or independent of each other. In addition, the gene expression values have to be on an additive scale, that is logarithmic or log-like scale.

Usage

1
prepareData(eset1, eset2, mapping = NULL)

Arguments

eset1

The main inputs are the distinct studies. Each study is stored in a named list, which has five elements: data, name, var, out and paired, see details below.

eset2

Same as eset2 for the second data set.

mapping

Data frame containing one named vector for each study. The vectors are comprised of probe IDs that fit to the rownames of the corresponding expression set. For each study, the IDs are ordered identically. For example, the kth row of mapping provides the label of the kth gene in each single study. If all studies were done on the same chip, no mapping is needed (default).

Details

Each study has to be stored in a list with five elements:

data Object of class ExpressionSet or AffyBatch.
name Character string with comparison label.
var Character string with phenodata variable. Based on this variable, the samples for the two-sample testing will be extracted.
out Vector of two character strings with the levels of var that define the two clinical classes. The order of the two levels must be identical for all studies. Ideally, the first entry corresponds to the bad and the second one to the good outcome level.
paired Logical - TRUE if samples are paired (e.g. two measurements per patients) or FALSE if all samples are independent of each other. If data are paired, the paired samples need to be in (whatever) successive order. Thus, the first sample of one condition must match to the first sample of the second condition and so on.

Value

An object of class ExpressionSet containing the joint data sets with appropriate phenodata.

Author(s)

Stefanie Scheid

References

Yang X, Bentink S, Scheid S, and Spang R (2006): Similarities of ordered gene lists, to appear in Journal of Bioinformatics and Computational Biology.

See Also

OL.data, OrderedList

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
data(OL.data)

### 'map' contains the appropriate mapping between 'breast' and 'prostate' IDs.
### Let's first concatenate two studies.
A <- prepareData(
                 list(data=OL.data$prostate,name="prostate",var="outcome",out=c("Rec","NRec"),paired=FALSE),
                 list(data=OL.data$breast,name="breast",var="Risk",out=c("high","low"),paired=FALSE),
                 mapping=OL.data$map
                 )

### We might want to examine the first 100 probes only.
B <- prepareData(
                 list(data=OL.data$prostate,name="prostate",var="outcome",out=c("Rec","NRec"),paired=FALSE),
                 list(data=OL.data$breast,name="breast",var="Risk",out=c("high","low"),paired=FALSE),
                 mapping=OL.data$map[1:100,]
                 )

Example output

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: twilight
Loading required package: splines

OrderedList documentation built on Nov. 8, 2020, 5:41 p.m.