glSim: Simulation of simple genlight objects

View source: R/glSim.R

glSimR Documentation

Simulation of simple genlight objects

Description

The function glSim simulates simple SNP data with the possibility of contrasted structures between two groups as well as background ancestral population structure. Returned objects are instances of the class genlight.

Usage

glSim(n.ind, n.snp.nonstruc, n.snp.struc = 0, grp.size = c(0.5, 0.5), k = NULL,
                    pop.freq = NULL, ploidy = 1, alpha = 0, parallel = FALSE,
                    LD = TRUE, block.minsize = 10, block.maxsize = 1000, theta = NULL,
                    sort.pop = FALSE, ...)

Arguments

n.ind

an integer indicating the number of individuals to be simulated.

n.snp.nonstruc

an integer indicating the number of non-structured SNPs to be simulated; for these SNPs, all individuals are drawn from the same binomial distribution.

n.snp.struc

an integer indicating the number of structured SNPs to be simulated; for these SNPs, different binomial distributions are used for the two simulated groups; frequencies of the derived alleles in groups A and B are built to differ (see details).

grp.size

a vector of length 2 specifying the proportions of the two phenotypic groups (must sum to 1). By default, both groups have the same size.

k

an integer specifying the number of ancestral populations to be generated.

pop.freq

a vector of length k specifying the proportions of the k ancestral populations (must sum to 1). If, as by default, pop.freq is null, and k is non-null, pop.freq will be the result of random sampling into k population groups.

ploidy

an integer indicating the ploidy of the simulated genotypes.

alpha

asymmetry parameter: a numeric value between 0 and 0.5, used to enforce allelic differences between the groups. Differences between groups are strongest when alpha = 0.5 and weakest when alpha = 0 (see details).

parallel

a logical indicating whether multiple cores should be used in generating the simulated data (TRUE). This option can reduce the amount of computational time required to simulate the data, but is not supported on Windows.

LD

a logical indicating whether loci should be displaying linkage disequilibrium (TRUE) or be generated independently (FALSE, default). When set to TRUE, data are generated by blocks of correlated SNPs (see details).

block.minsize

an optional integer indicating the minimum number of SNPs to be handled at a time during the simulation of linked SNPs (when LD=TRUE. Increasing the minimum block size will increase the RAM requirement but decrease the amount of computational time required to simulate the genotypes.

block.maxsize

an optional integer indicating the maximum number of SNPs to be handled at a time during the simulation of linked SNPs. Note: if LD blocks of equal size are desired, set block.minsize = block.maxsize.

theta

an optional numeric value between 0 and 0.5 specifying the extent to which linkage should be diluted. Linkage is strongest when theta = 0 and weakest when theta = 0.5.

sort.pop

a logical specifying whether individuals should be ordered by ancestral population (sort.pop=TRUE) or phenotypic population (sort.pop=FALSE).

...

arguments to be passed to the genlight constructor.

Details

=== Allele frequencies in contrasted groups ===

When n.snp.struc is greater than 0, some SNPs are simulated in order to differ between groups (noted 'A' and 'B'). Different patterns between groups are achieved by using different frequencies of the second allele for A and B, denoted p_A and p_B. For a given SNP, p_A is drawn from a uniform distribution between 0 and (0.5 - alpha). p_B is then computed as 1 - p_A. Therefore, differences between groups are mild for alpha=0, and total for alpha = 0.5.

=== Linked or independent loci ===

Independent loci (LD=FALSE) are simulated using the standard binomial distribution, with randomly generated allele frequencies. Linked loci (LD=FALSE) are trickier towe need to simulate discrete variables with pre-defined correlation structure.

Here, we first generate deviates from multivariate normal distributions with randomly generated correlation structures. These variables are then discretized using the quantiles of the distribution. Further improvement of the procedure will aim at i) specifying the strength of the correlations between blocks of alleles and ii) enforce contrasted structures between groups.

Value

A genlight object.

Author(s)

Caitlin Collins caitlin.collins12@imperial.ac.uk, Thibaut Jombart t.jombart@imperial.ac.uk

See Also

- genlight: class of object for storing massive binary SNP data.

- glPlot: plotting genlight objects.

- glPca: PCA for genlight objects.

Examples

## Not run: 
## no structure
x <- glSim(100, 1e3, ploid=2)
plot(x)

## 1,000 non structured SNPs, 100 structured SNPs
x <- glSim(100, 1e3, n.snp.struc=100, ploid=2)
plot(x)

## 1,000 non structured SNPs, 100 structured SNPs, ploidy=4
x <- glSim(100, 1e3, n.snp.struc=100, ploid=4)
plot(x)

## same thing, stronger differences between groups
x <- glSim(100, 1e3, n.snp.struc=100, ploid=2, alpha=0.4)
plot(x)

##  same thing, loci with LD structures
x <- glSim(100, 1e3, n.snp.struc=100, ploid=2, alpha=0.4, LD=TRUE, block.minsize=100)
plot(x)

## End(Not run)

adegenet documentation built on Feb. 16, 2023, 6 p.m.