sim.structure: A function for simulating aCGH data and the corresponding...

Description Usage Arguments Details Value Author(s) References

Description

This simulation scheme operates in two stages. Initially, we simulate the layout of clones before using a modified version of the scheme developed by Willenbrock et al., 2005 to generate the \log_2 ratios. For each simulated clone layout we generate 20 sets of simulated \log_2 ratios from one of five templates. Additionally, we also take account of the cellularity of the test sample in our simulation.

Usage

1
2
3
4
5
6
7
8
9
simulateData(nArrays = 20, chrominfo = NULL, prb.short.tiled = 0.5,
                 prb.long.tiled = 0.5, non.tiled.lower.res = 0.9,
                 non.tiled.upper.res = 1.1, length.clone.lower = 0.05,
                 length.clone.upper = 0.2, tiled.lower.res = -0.05,
                 tiled.upper.res = 0, sd = NULL, output = FALSE,
                 prb.proportion.tiled = c(0.2, 0.2, 0.2, 0.2, 0.2),
                 zerolengthnontiled = NULL, zerolengthtiled = NULL,
                 nonzerolengthnontiled = NULL, nonzerolengthtiled =
                 NULL, seed = 1)

Arguments

nArrays

The number of arrays we want to simulate

chrominfo

The information about chromosome length/centromere location to be used. Defaults to the information provided in aCGH package of Jane Fridlyand and Peter Dimitrov.

prb.short.tiled

The probability of a tiled region on the short arm of the simulated chromosome (defaults to 0.5).

prb.long.tiled

The probability of a tiled region on the long arm of the simulated chromosome (defaults to 0.5).

non.tiled.lower.res

The lower limit for the distance (in Mbs) between adjacent clones in non-tiled regions of the genome (defaults to 0.9Mb).

non.tiled.upper.res

The upper limit for the distance (in Mbs) between adjacent clones in non-tiled regions of the genome (defaults to 1.1Mb).

length.clone.lower

The lower limit for the length (in Mbs) of a clone (this defaults to 0.05Mb).

length.clone.upper

The upper limit for the length (in Mbs) of a clone (this defaults to 0.2Mb).

tiled.lower.res

The lower limit for the distance (in Mbs) between adjacent clones in tiled regions of the genome (defaults to -0.05Mb).

tiled.upper.res

The upper limit for the distance (in Mbs) between adjacent clones in tiled regions of the genome (defaults to 0Mb).

sd

The standard deviation of the simulated data in each of the states. Defaults to being randomly sampled between 0.1 and 0.2.

output

A logical variable which is TRUE if you want the output to be written to txt files in the present working directory. Defaults to FALSE.

prb.proportion.tiled

Given that an arm of a chromosome contains a tiled region this variable (which is a vector of length 5) gives the probability that 20,30,40,50 or 100% of the chromosome is tiled. It defaults to (0.2,0.2,0.2,0.2,0.2)

zerolengthnontiled

The empirical distribution for regions of the genome which are non-tiled and contain no copy number gains or losses. Defaults to zero.length.distr.non.tiled

zerolengthtiled

The empirical distribution for regions of the genome which are tiled and contain no copy number gains or losses. Defaults to zero.length.distr.tiled

nonzerolengthnontiled

The empirical distribution for regions of the genome which are non-tiled and contain no copy number gains or losses. Defaults to non.zero.length.distr.non.tiled

nonzerolengthtiled

The empiricial distribution for regions of the genome which are tiled and contain copy number gains or losses. Defaults to non.zero.length.distr.tiled

seed

Seed value allowing simulation to be reproduced if the same seed value is set.

Details

For more details see the article by Marioni and Thorne published in Bioinformatics.

Value

The function returns a list containing the following elements.

clones

Gives the start, end and midpoint of the simulated clones.

class.output

A list of the true underlying state clones are assigned to for each of the twenty simulations associated with each clone layout.

class.matrix

Defines the true underlying state clones are assigned to in each of the five classes

classes

Which of the five class outputs has been used to simulate the \log_2 ratios

datamatrix

A matrix containing twenty columns each of which contains the simulated \log_2 ratios associated with each of the simulations for a particular clone layout.

samples

Gives information about the cellularity associated with each of the samples.

Author(s)

John Marioni and Natalie Thorne

References

See the relevant article in Bioinformatics or the following website: www.damtp.cam.ac.uk/user/jcm68


snapCGH documentation built on Nov. 8, 2020, 5:31 p.m.