Construct the spike-in matrix
Identify rows in the SCESet corresponding to spike-in transcripts, and retrieve a matrix of counts or normalized expression values for those rows.
1 2 3 4 5 6 7 8 9
A SCESet object with spike-in data in the
A character vector specifying which spike-in set(s) should be extracted.
A string specifying whether counts or normalized expression values are to be extracted.
A character vector specifying which control sets are spike-ins.
spikes methods indicate which rows correspond to spike-ins and their expression values, respectively.
If multiple spike-in sets are available, users can extract information for specific sets by supplying the names of the set in
(By default, rows from all spike-in sets are extracted when
assay="exprs", users should have run
To specify rows as corresponding to spike-ins, we assume that
calculateQCMetrics has already been applied to
Specifically, we assume that spike-ins represent a subset of the control sets supplied as
Users can assign a character vector to
isSpike(x)<- containing the names of the control sets that are spike-ins.
This will automatically construct a logical vector containing rows from all specified sets, for later retrieval with
isSpike(x)<-NULL will clear all existing spike-in information in
Note that direct assignment of a logical vector to
isSpike(x)<- is no longer permitted.
This is because the names of the spike-in sets are necessary for downstream processing, but will not be included if
isSpike(x) is set directly.
spikes, a numeric matrix of counts or normalized expression values, with one column per cell and one row per spike-in transcript.
isSpike, a logical vector indicating which rows are spike-ins (or
NULL, if this information was not stored in
x is modified to store a spike-specifying vector in
A logical vector indicating which controls are spike-ins is also stored in the
featureControlInfo slot of
Note on overlapping sets
While it is possible to declare overlapping sets as the spike-in sets with
isSpike(x)<-, this is not advisable.
This is because some downstream operations assume that each row belongs to only one set (i.e., one of the spike-in sets, or the set of endogenous genes).
For example, normalization will use size factors from only one of the sets, so correspondence to multiple sets will not be honoured.
A warning will thus be raised if overlapping sets are specified in
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
set.seed(100) popsize <- 10 ngenes <- 1000 all.facs <- 2^rnorm(popsize, sd=0.5) counts <- matrix(rnbinom(ngenes*popsize, mu=10*all.facs, size=1), ncol=popsize, byrow=TRUE) spikes <- matrix(rnbinom(100*popsize, mu=10*all.facs, size=0.5), ncol=popsize, byrow=TRUE) combined <- rbind(counts, spikes) colnames(combined) <- seq_len(popsize) rownames(combined) <- seq_len(nrow(combined)) y <- newSCESet(countData=combined) y <- calculateQCMetrics(y, list(IAmASpike=rep(c(FALSE, TRUE), c(ngenes, 100)))) isSpike(y) <- "IAmASpike" y <- computeSpikeFactors(y) y <- normalize(y) spikes(y)[1:10,] spikes(y, assay="exprs")[1:10,] isSpike(y)
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.