Convert to other classes

Share:

Description

Convert a SCESet object into other classes for entry into other analysis pipelines.

Usage

1
2
3
4
## S4 method for signature 'SCESet'
convertTo(x, type=c("edgeR", "DESeq2", "monocle"),
    fData.col=NULL, pData.col=NULL, ..., assay, 
    use.all.sf=TRUE, normalize=TRUE, subset.row=NULL, get.spikes=FALSE)

Arguments

x

A SCESet object.

type

A string specifying the analysis for which the object should be prepared.

fData.col

Any set of indices specifying which columns of fData(x) should be retained in the returned object.

pData.col

Any set of indices specifying which columns of pData(x) should be retained.

...

Other arguments to be passed to pipeline-specific constructors.

assay

A string specifying which assay of x should be put in the returned object.

use.all.sf

A logical scalar indicating whether multiple size factors should be used to generate the returned object.

normalize

A logical scalar specifying whether the assay values should be normalized for type="monocle".

subset.row

A logical, integer or character scalar indicating the rows of x to return.

get.spikes

A logical scalar specifying whether rows corresponding to spike-in transcripts should be returned.

Details

This function converts an SCESet object into various other classes in preparation for entry into other analysis pipelines, as specified by type. Gene- and cell-specific data fields can be retained in the output object by setting fData.col and pData.col, respectively. Other arguments can be passed to the relevant constructors through the ellipsis.

By default, for edgeR and DESeq2, assay is set to "counts" such that count data is stored in the output object. This is consistent with the required inputs to these analyses. Information about normalization is instead transmitted via size or normalization factors in the output object. For monocle, assay is ignored and counts are divided by the size factors to yield (roughly) log-normally distributed expression values. Values in assay can be used directly by setting normalize=FALSE.

In all cases, rows corresponding to spike-in transcripts are removed from the output object by default. As such, rows in the returned object may not correspond directly to rows in x. Users should consider this when retrieving analysis results from these pipelines, e.g., match on row names in x before comparing to other results. This behaviour can be turned off by setting get.spikes=TRUE, such that all rows are retrieved in the output object. Users can also set subset.row to extract specific rows, in which case get.spikes is ignored.

By default, different size factors for different rows (e.g., for spike-in sets) will be respected. For edgeR, an offset matrix will be constructed containing mean-centred log-size factors for each row. For DESeq2, a similar matrix will be constructed containing size factors scaled to have a geometric mean of unity. For monocle, counts for each row will be divided by the size factors for that row. This behaviour can be turned off with use.all.sf=FALSE, such that only sizeFactors(x) is used for normalization for all type. (For edgeR and DESeq2, the offset matrix is not generated if all rows correspond to sizeFactors(x), as this information is already stored in the object.)

Value

For type="edgeR", a DGEList object is returned containing the count matrix. Size factors are converted to normalization factors. Gene-specific fData is stored in the genes element, and cell-specific pData is stored in the samples element.

For type="DESeq2", a DESeqDataSet object is returned containing the count matrix and size factors. Additional gene- and cell-specific data is stored in the mcols and colData respectively.

For type="monocle", a CellDataSet object is returned containing the unlogged expression values. Additional gene- and cell-specific data is stored in the fData and pData respectively.

Author(s)

Aaron Lun

See Also

DGEList, DESeqDataSetFromMatrix, newCellDataSet

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
ncells <- 200
ngenes <- 1000
count.sizes <- rnbinom(ncells, mu=100, size=5)
dummy <- matrix(count.sizes, ncol=ncells, nrow=ngenes, byrow=TRUE)
rownames(dummy) <- paste0("X", seq_len(ngenes))

X <- newSCESet(countData=data.frame(dummy))
X <- calculateQCMetrics(X, list(Spike=rbinom(ngenes, 1, 0.5)==0L))
isSpike(X) <- "Spike"
sizeFactors(X) <- 2^rnorm(ncells)
X <- normalize(X)

fData(X)$SYMBOL <- paste0("X", seq_len(ngenes))
X$other <- sample(LETTERS, ncells, replace=TRUE)

convertTo(X, type="edgeR")
convertTo(X, type="DESeq2")
convertTo(X, type="monocle")