04-simulate: Simulating Tumor Clones
In CloneSeeker: Seeking and Finding Clones in Copy Number and Sequencing Data

Simulating Clones

R Documentation

Simulating Tumor Clones

Description

Simulating copy number segmentation data and sequencing mutation data for tumors composed of multiple clones.

Usage

generateTumorData(tumor, snps.seq, snps.cgh, mu, sigma.reads,
                  sigma0.lrr, sigma0.baf, density.sigma)
plotTumorData(tumor, data)
tumorGen(...)
dataGen(tumor, ...)

Arguments

`tumor`	an object of the `Tumor` class.
`snps.seq`	an integer; the total number of germline variants and somatic mutations to simulate in the tumor genome.
`snps.cgh`	an integer; the number of single nucleotide polymorphisms (SNPs) to simulate as meaurements made to estimate copy number.
`mu`	an integer; the average read depth of a simulated sequencing study giving rise to mutations.
`sigma.reads`	a real number; the standard deviation of the number of simulated sequencing reads per base.
`sigma0.lrr`	a real number; the standard deviation of the simulated per-SNP log R ratio (LRR) for assessing copy number.
`sigma0.baf`	a real number; the standard deviation of the simulated B allele frequency (BAF) for assessing copy number.
`density.sigma`	a real number; the standard deviation of a beta distribution used to simulate the number of SNP markers per copy number segment.
`data`	a list containing two data frames, `cn.data` and `seq.data`, as produced by `generateTumorData`.
`...`	additional variables

Details

Copy number and mutation data are simulated essentially independently. Each simulation starts with a single "normal" genome, and CNVs and/or mutations are randomly generated for each new "branch" or subclone. (The number of subclones depends on the input parameters.) Each successive branch is randomly determined to descend from one of the existing clones, and therefore contains both the aberrations belonging to its parent clone and the novel aberrations assigned to it. Depending on input parameters, the algorithm can also randomly select some clones for extinction in the process of generating the heterogeneous tumor, to yield a more realistic population structure.

Note that tumorGen (an alias for Tumor that returns a list instead of a Tumor object) and dataGen (an alias for generateTumorData) are DEPRECATED.

Value

The generateTumorData function returns a list with two components, cn.data and seq.data. Each component is itself a data frame. Note that in some cases, one of these data frames may have zero rows or may be returned as an NA.

The cn.data component contains seven columns:

chr: the chromosome number;
seq: a unique segment identifier;
LRR: simulated segment-wise log ratios;
BAF: simulated segment-wise B allele frequencies;
X and Y: simulated intensities for two separate alleles/haplotypes per segment; and
markers: the simulated number of SNPS per segment.

The seq.data component contains eight columns:

chr: the chromosome number;
seq: a unique "segment" identifier;
mut.id: a unique mutation identifier;
refCounts and varCounts: the simulated numbers of reference and variant counts per mutation;
VAF: the simulated variant allele frequency;
totalCounts: the simulated total number of read counts; and
status: a character (that should probably be a factor) indicating whether a variant should be viewed as somatic or germline.

The plotTumorData function invisibly returns its data argument.

Author(s)

Kevin R. Coombes krc@silicovore.com, Mark Zucker zucker.64@buckeyemail.osu.edu

Examples

psis <- c(0.6, 0.3, 0.1) # three clones
# create tumor with copy number but no mutation data
tumor <- Tumor(psis, rounds = 400, nu = 0, pcnv = 1, norm.contam = FALSE)
# simulate the dataset
dataset <- generateTumorData(tumor, 10000, 600000, 70, 25, 0.15, 0.03, 0.1)
#plot it
plotTumorData(tumor, dataset)

CloneSeeker documentation built on April 11, 2025, 5:42 p.m.