simAllopoly: Generate Simulated Datasets

View source: R/allopolyploidy.R

simAllopolyR Documentation

Generate Simulated Datasets

Description

Given the number of subgenomes, the ploidy of each subgenome, and optionally, allele frequencies, simAllopoly will generate a "genambig" object containing simulated data for one locus.

Usage

simAllopoly(ploidy = c(2, 2), n.alleles = c(4, 4), n.homoplasy = 0,
            n.null.alleles=rep(0, length(ploidy)), alleles = NULL,
            freq = NULL, meiotic.error.rate=0, nSam = 100, locname = "L1")

Arguments

ploidy

A vector of integers, with one value for each subgenome, indicating the ploidy of that subgenome. For example, c(2,2) indicates an allotetraploid. An allohexaploid, with three diploid subgenomes, would be coded as c(2,2,2).

n.alleles

A vector, in similar format to the previous argument, indicating how many different unique alleles there are for each isolocus. Ignored if alleles is provided.

n.homoplasy

A single value indicating how many homoplasious alleles there are. Ignored if alleles is provided. This value should not be greater than any value in n.alleles. If freq is provided, the frequency or frequencies at the end of each vector will be the frequencies of homopolasious alleles.

n.null.alleles

A vector, in similar format to ploidy and n.alleles, indicating how many null alleles there are for each isolocus. Ignored if alleles is provided. These values should not be greater than n.alleles. If freq is provided, the frequency or frequencies at the beginning of each vector will be the null allele frequencies.

alleles

Optional. A list of vectors of allele names (which are usually expressed as integers, but can also be character strings if desired). Each element of the list contains the allele names for the corresponding isolocus. Zero indicates a null allele. Allele names that are identical between isoloci will be treated as homoplasious. If this argument is not provided, alleles will be named as described in “Details”.

freq

Optional. A list of vectors of allele frequencies. If alleles is provided, all of the vectors must match in length between the two lists. Otherwise, the lengths of the vectors much match the values in n.alleles. If freq is not provided, it will be randomly generated.

meiotic.error.rate

A single value ranging from 0 to 0.5. The probability of a gamete containing a meiotic error involving this locus. See “Details”.

nSam

A single value indicating the number of samples to generate.

locname

The name for the locus.

Details

If alleles=NULL, allele names will be generated in the format A-1, A-2, B-1, B-2 etc., where A and B refer to separate subgenomes. Homoplasious alleles will be named H-1, H-2, etc.

Meiotic errors, as simulated by simAllopoly, always result in balanced aneuploidy, i.e. one copy of an isolocus will be replaced by an additional copy of a different isolocus. This is simulated on a per-gamete basis, so each gamete can have a maximum of one meiotic error per locus, but an individual could potentially be derived from two error-containing gametes. Note that in homozygotes and partial heterozygotes, it may not be possible to detect aneuploidy by examining the genotype; this phenomenon lowers the apparent rate of aneuploidy in the dataset.

If alleles are provided by the user with the alleles argument, zero (for sets of numeric alleles) or "N" (for sets of character alleles) indicates a null allele. The null allele will be simulated at the frequency specified, but will not be shown in the output dataset. Genotypes with no non-null alleles are recorded as missing.

Value

A "genambig" object.

Note

Unlike the code supplied in the file extdata/simgen.R, all genotypes in a dataset generated by this function will be of the same ploidy.

Author(s)

Lindsay V. Clark

References

Clark, L. V. and Drauch Schreier, A. (2017) Resolving microsatellite genotype ambiguity in populations of allopolyploid and diploidized autopolyploid organisms using negative correlations between alleles. Molecular Ecology Resources, 17, 1090–1103. DOI: 10.1111/1755-0998.12639.

See Also

alleleCorrelations, catalanAlleles, simgen

Examples

# Generate an allotetraploid dataset with no homoplasy.
# One isolocus has five alleles, while the other has eight.
test <- simAllopoly(n.alleles=c(5,8))

# Generate an allo-octoploid dataset with two tetraploid subgenomes, ten
# alleles per subgenome, including one homoplasious allele.
test2 <- simAllopoly(ploidy=c(4,4), n.alleles=c(10,10), n.homoplasy=1)

# Generate an allotetraploid dataset, and manually define allele names
# and frequencies.
test3 <- simAllopoly(alleles=list(c(120,124,126),c(130,134,138,140)),
                         freq=list(c(0.4,0.3,0.3),c(0.25,0.25,0.25,0.25)))

# Generate an autotetraploid dataset with seven alleles
test4 <- simAllopoly(ploidy=4, n.alleles=7)

# Generate an allotetraploid dataset with a null allele at high frequency
test5 <- simAllopoly(n.null.alleles=c(1,0),
                      freq=list(c(0.5,0.1,0.1,0.3), c(0.25,0.25,0.4,0.1)))

polysat documentation built on Aug. 23, 2022, 5:07 p.m.