Construct the spike-in matrix

Share:

Description

Identify rows in the SCESet corresponding to spike-in transcripts, and retrieve a matrix of counts or normalized expression values for those rows.

Usage

1
2
3
4
5
6
7
8
9
## S4 method for signature 'SCESet'
isSpike(x, type=NULL)
## S4 method for signature 'SCESet'
spikes(x, assay="counts", type=NULL)

## S4 replacement method for signature 'SCESet,character'
isSpike(x) <- value
## S4 replacement method for signature 'SCESet,NULL'
isSpike(x) <- value

Arguments

x

A SCESet object with spike-in data in the colData.

type

A character vector specifying which spike-in set(s) should be extracted.

assay

A string specifying whether counts or normalized expression values are to be extracted.

value

A character vector specifying which control sets are spike-ins. Alternatively a NULL value, to remove existing spike-in specifications.

Details

The isSpike and spikes methods indicate which rows correspond to spike-ins and their expression values, respectively. If multiple spike-in sets are available, users can extract information for specific sets by supplying the names of the set in type. (By default, rows from all spike-in sets are extracted when type=NULL.) If assay="exprs", users should have run x through normalize.

To specify rows as corresponding to spike-ins, we assume that calculateQCMetrics has already been applied to x. Specifically, we assume that spike-ins represent a subset of the control sets supplied as feature_controls in calculateQCMetrics. Users can assign a character vector to isSpike(x)<- containing the names of the control sets that are spike-ins. This will automatically construct a logical vector containing rows from all specified sets, for later retrieval with isSpike(x).

Setting isSpike(x)<-NULL will clear all existing spike-in information in x. Note that direct assignment of a logical vector to isSpike(x)<- is no longer permitted. This is because the names of the spike-in sets are necessary for downstream processing, but will not be included if isSpike(x) is set directly.

Value

For spikes, a numeric matrix of counts or normalized expression values, with one column per cell and one row per spike-in transcript.

For isSpike, a logical vector indicating which rows are spike-ins (or NULL, if this information was not stored in x).

For isSpike<-, x is modified to store a spike-specifying vector in fData(x)$is_feature_spike. A logical vector indicating which controls are spike-ins is also stored in the featureControlInfo slot of x.

Note on overlapping sets

While it is possible to declare overlapping sets as the spike-in sets with isSpike(x)<-, this is not advisable. This is because some downstream operations assume that each row belongs to only one set (i.e., one of the spike-in sets, or the set of endogenous genes). For example, normalization will use size factors from only one of the sets, so correspondence to multiple sets will not be honoured. A warning will thus be raised if overlapping sets are specified in value.

Author(s)

Aaron Lun

See Also

normalize, calculateQCMetrics, SCESet

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
set.seed(100)
popsize <- 10
ngenes <- 1000
all.facs <- 2^rnorm(popsize, sd=0.5)
counts <- matrix(rnbinom(ngenes*popsize, mu=10*all.facs, size=1), ncol=popsize, byrow=TRUE)
spikes <- matrix(rnbinom(100*popsize, mu=10*all.facs, size=0.5), ncol=popsize, byrow=TRUE)

combined <- rbind(counts, spikes)
colnames(combined) <- seq_len(popsize)
rownames(combined) <- seq_len(nrow(combined))
y <- newSCESet(countData=combined)
y <- calculateQCMetrics(y, list(IAmASpike=rep(c(FALSE, TRUE), c(ngenes, 100))))
isSpike(y) <- "IAmASpike"

y <- computeSpikeFactors(y)
y <- normalize(y)
spikes(y)[1:10,]
spikes(y, assay="exprs")[1:10,]
isSpike(y)