generate.clonal.data: generate.clonal.data (part of lymphclon package)

Description Usage Arguments Value Author(s) Examples

Description

This function generates simulated data, for evaluation purposes. We start with an underlying multinomial population with entries proportional to rank^-power distribution, where power is fixed. Next, we draw multinomially from this distribution, a fixed number of cells, to generate each desired replicate. Then, this distribution is subject to log-normal error, and subsequently scaled up to the expected number of reads. To round into integers, the expected number of reads for each clone is finally pushed through a poisson process to generate integer read counts. The poisson "rounding" process is why the resulting read counts are not exactly as specified.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
generate.clonal.data(
  n = 2e+07, 
  num.cells.taken.vector = c(2000, 5000, 10000, 20000, 50000, 50000), 
  read.count.per.replicate.vector = rep(20000, length(num.cells.taken.vector)), 
  clonal.distribution.power = -sqrt(2),
  pcr.noise.type = 'pareto',
  pcr.pareto.location = 1,
  pcr.pareto.shape = 1,
  pcr.lognormal.meanlog = 0,
  pcr.lognormal.sdlog = 1)

Arguments

n

The true number of distinct clones in the underlying assemblage

num.cells.taken.vector

A vector specifying the number of cells taken in each independent biological replicate

read.count.per.replicate.vector

A vector of the same length as num.cells.taken.vector, specifying the number of reads generated from each biological replicate, of the same corresponding indices

clonal.distribution.power

The true underlying clonal multinomial distribution is proportional to (1:n)^-clonal.distribution.power

pcr.noise.type

A string denoting the type of PCR noise: either 'pareto' (default), or 'lognormal'. The package author Yi Liu has found anecdotally and empirically that pareto distributions model sequencing amplification bonanzas much better than lognormal distributions.

pcr.pareto.location

The location parameter for the pareto distribution; matters only if the noise type is pareto.

pcr.pareto.shape

The shape parameter for the pareto distribution; matters only if the noise type is pareto.

pcr.lognormal.meanlog

The meanlog parameter for the lognormal distribution; matters only if the nosie type is lognormal

pcr.lognormal.sdlog

The sdlog parameter for the lognormal distribution; matters only if the nosie type is lognormal

Value

read.count.matrix

This is a matrix of simulated counts, with rows corresponding to clones (classes, or species), and columns corresponding to biological replicates

true.clone.prob

This is the underlying simulated assemblage multinomial distribution used to generate read.count.matrix

true.clonality

This is the true clonality score of the underlying simulated assemblage

Author(s)

Yi Liu (liuyipei@stanford.edu / liu.yi.pei@gmail.com)

Examples

1
2
3
4
5
my.data <- generate.clonal.data(n=2e3) 
# n ~ 2e7 is more appropriate for a realistic B cell repertoire
my.lymphclon.results <- infer.clonality(my.data$read.count.matrix)
# a consistently improved estimate of clonality (the squared 
# 2-norm of the underlying multinomial distribution)

Example output



lymphclon documentation built on May 2, 2019, 7:23 a.m.