| Simulating Clones | R Documentation |
Simulating copy number segmentation data and sequencing mutation data for tumors composed of multiple clones.
generateTumorData(tumor, snps.seq, snps.cgh, mu, sigma.reads,
sigma0.lrr, sigma0.baf, density.sigma)
plotTumorData(tumor, data)
tumorGen(...)
dataGen(tumor, ...)
tumor |
an object of the |
snps.seq |
an integer; the total number of germline variants and somatic mutations to simulate in the tumor genome. |
snps.cgh |
an integer; the number of single nucleotide polymorphisms (SNPs) to simulate as meaurements made to estimate copy number. |
mu |
an integer; the average read depth of a simulated sequencing study giving rise to mutations. |
sigma.reads |
a real number; the standard deviation of the number of simulated sequencing reads per base. |
sigma0.lrr |
a real number; the standard deviation of the simulated per-SNP log R ratio (LRR) for assessing copy number. |
sigma0.baf |
a real number; the standard deviation of the simulated B allele frequency (BAF) for assessing copy number. |
density.sigma |
a real number; the standard deviation of a beta distribution used to simulate the number of SNP markers per copy number segment. |
data |
a list containing two data frames, |
... |
additional variables |
Copy number and mutation data are simulated essentially independently. Each simulation starts with a single "normal" genome, and CNVs and/or mutations are randomly generated for each new "branch" or subclone. (The number of subclones depends on the input parameters.) Each successive branch is randomly determined to descend from one of the existing clones, and therefore contains both the aberrations belonging to its parent clone and the novel aberrations assigned to it. Depending on input parameters, the algorithm can also randomly select some clones for extinction in the process of generating the heterogeneous tumor, to yield a more realistic population structure.
Note that tumorGen (an alias for Tumor that returns a
list instead of a Tumor object) and dataGen (an alias for
generateTumorData) are DEPRECATED.
The generateTumorData function returns a list with two
components, cn.data and seq.data. Each component is
itself a data frame. Note that in some cases, one of these data frames
may have zero rows or may be returned as an NA.
The cn.data component contains seven columns:
chrthe chromosome number;
seqa unique segment identifier;
LRRsimulated segment-wise log ratios;
BAFsimulated segment-wise B allele frequencies;
X and Ysimulated intensities for two separate alleles/haplotypes per segment; and
markersthe simulated number of SNPS per segment.
The seq.data component contains eight columns:
chrthe chromosome number;
seqa unique "segment" identifier;
mut.ida unique mutation identifier;
refCounts and varCountsthe simulated numbers of reference and variant counts per mutation;
VAFthe simulated variant allele frequency;
totalCountsthe simulated total number of read counts; and
statusa character (that should probably be a factor) indicating whether a variant should be viewed as somatic or germline.
The plotTumorData function invisibly returns its data
argument.
Kevin R. Coombes krc@silicovore.com, Mark Zucker zucker.64@buckeyemail.osu.edu
psis <- c(0.6, 0.3, 0.1) # three clones
# create tumor with copy number but no mutation data
tumor <- Tumor(psis, rounds = 400, nu = 0, pcnv = 1, norm.contam = FALSE)
# simulate the dataset
dataset <- generateTumorData(tumor, 10000, 600000, 70, 25, 0.15, 0.03, 0.1)
#plot it
plotTumorData(tumor, dataset)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.