Normalization with spike-in counts

Share:

Description

Compute size factors based on the coverage of spike-in transcripts.

Usage

1
2
## S4 method for signature 'SCESet'
computeSpikeFactors(x, type=NULL, sf.out=FALSE, general.use=TRUE)

Arguments

x

A SCESet object containing rows corresponding spike-in transcripts.

type

A character vector specifying which spike-in sets to use.

sf.out

A logical scalar indicating whether only size factors should be returned.

general.use

A logical scalar indicating whether the size factors should be stored for general use by all genes.

Details

The size factor for each cell is defined as the sum of all spike-in counts in each cell. This is equivalent to normalizing to equalize spike-in coverage between cells. Size factors are scaled so that the mean of all size factors is unity, for standardization purposes if one were to compare different sets of size factors.

Spike-in counts are assumed to be stored in the rows specified by isSpike(x). This specification should have been performed by supplying the names of the spike-in sets – see ?isSpike<- for more details. By default, if multiple spike-in sets are available, all of them will be used to compute the size factors. The function can be restricted to a subset of the spike-ins by specifying the names of the desired spike-in sets in type.

By default, the function will store several copies of the same size factors in the output object. One copy will be stored in sizeFactors(x) for normalization of all genes – this can be disabled by setting general.use=FALSE. One copy will also be stored in sizeFactors(x, type=s), where s is the name of a spike-in set in type. (If type=NULL, a copy is stored for every spike-in set, as all of them would be used to compute the size factors.) Separate storage allows spike-in-specific normalization in normalize,SCESet-method.

Value

If sf.out=TRUE, a numeric vector of size factors is returned directly.

Otherwise, an object of class x is returned, containing size factors for all cells. A copy of the vector is stored for each spike-in set that was used to compute the size factors. If general.use=TRUE, a copy is also stored for use by non-spike-in genes.

Author(s)

Aaron Lun

See Also

SCESet

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Setting up an example.
set.seed(100)
popsize <- 200
ngenes <- 1000
all.facs <- 2^rnorm(popsize, sd=0.5)
counts <- matrix(rnbinom(ngenes*popsize, mu=all.facs*10, size=1), ncol=popsize, byrow=TRUE)
spikes <- matrix(rnbinom(100*popsize, mu=all.facs*10, size=0.5), ncol=popsize, byrow=TRUE)

combined <- rbind(counts, spikes)
colnames(combined) <- seq_len(popsize)
rownames(combined) <- seq_len(nrow(combined))
y <- newSCESet(countData=combined)
y <- calculateQCMetrics(y, list(IAmASpike=rep(c(FALSE, TRUE), c(ngenes, 100))))
isSpike(y) <- "IAmASpike"

# 
y <- computeSpikeFactors(y)
sizeFactors(y)
sizeFactors(y, type="IAmASpike")

# general.use=FALSE does not modify general size factors
y2 <- y
sizeFactors(y2) <- 1
sizeFactors(y2, type="IAmASpike") <- 1
y2 <- computeSpikeFactors(y2, general.use=FALSE)
sizeFactors(y2)
sizeFactors(y2, type="IAmASpike")