Description Usage Arguments Value Author(s) Examples
This function generates simulated data, for evaluation purposes. We start with an underlying multinomial population with entries proportional to rank^-power distribution, where power is fixed. Next, we draw multinomially from this distribution, a fixed number of cells, to generate each desired replicate. Then, this distribution is subject to log-normal error, and subsequently scaled up to the expected number of reads. To round into integers, the expected number of reads for each clone is finally pushed through a poisson process to generate integer read counts. The poisson "rounding" process is why the resulting read counts are not exactly as specified.
1 2 3 4 5 6 7 8 9 10 | generate.clonal.data(
n = 2e+07,
num.cells.taken.vector = c(2000, 5000, 10000, 20000, 50000, 50000),
read.count.per.replicate.vector = rep(20000, length(num.cells.taken.vector)),
clonal.distribution.power = -sqrt(2),
pcr.noise.type = 'pareto',
pcr.pareto.location = 1,
pcr.pareto.shape = 1,
pcr.lognormal.meanlog = 0,
pcr.lognormal.sdlog = 1)
|
n |
The true number of distinct clones in the underlying assemblage |
num.cells.taken.vector |
A vector specifying the number of cells taken in each independent biological replicate |
read.count.per.replicate.vector |
A vector of the same length as num.cells.taken.vector, specifying the number of reads generated from each biological replicate, of the same corresponding indices |
clonal.distribution.power |
The true underlying clonal multinomial distribution is proportional to (1:n)^-clonal.distribution.power |
pcr.noise.type |
A string denoting the type of PCR noise: either 'pareto' (default), or 'lognormal'. The package author Yi Liu has found anecdotally and empirically that pareto distributions model sequencing amplification bonanzas much better than lognormal distributions. |
pcr.pareto.location |
The location parameter for the pareto distribution; matters only if the noise type is pareto. |
pcr.pareto.shape |
The shape parameter for the pareto distribution; matters only if the noise type is pareto. |
pcr.lognormal.meanlog |
The meanlog parameter for the lognormal distribution; matters only if the nosie type is lognormal |
pcr.lognormal.sdlog |
The sdlog parameter for the lognormal distribution; matters only if the nosie type is lognormal |
read.count.matrix |
This is a matrix of simulated counts, with rows corresponding to clones (classes, or species), and columns corresponding to biological replicates |
true.clone.prob |
This is the underlying simulated assemblage multinomial distribution used to generate read.count.matrix |
true.clonality |
This is the true clonality score of the underlying simulated assemblage |
Yi Liu (liuyipei@stanford.edu / liu.yi.pei@gmail.com)
1 2 3 4 5 | my.data <- generate.clonal.data(n=2e3)
# n ~ 2e7 is more appropriate for a realistic B cell repertoire
my.lymphclon.results <- infer.clonality(my.data$read.count.matrix)
# a consistently improved estimate of clonality (the squared
# 2-norm of the underlying multinomial distribution)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.