View source: R/DAPC_adegenet.R
sim.vcf | R Documentation |
Create a simulated SNP dataset from either an input VCF file, an object of class vcfR, or by specifying characteristics of the simulated dataset using the function arguments. Most of the work is done by a call to the function 'glSim' from the 'adegenet' package. Note: At present I do not recommend setting LD=TRUE (or LD=NULL if the input VCF includes linked SNPs), because CHROM and POS columns of the output VCF will not reflect the presence of linkage. Check back for updates.
sim.vcf(
x = NULL,
save.as = NULL,
RA.probs = NULL,
n.ind = NULL,
n.snps = NULL,
snp.str = 0,
ploidy = NULL,
K = 2,
include.missing = FALSE,
fMD = NULL,
pMDi = NULL,
LD = FALSE,
block.minsize = 10,
block.maxsize = NULL,
use.maxLDx = FALSE,
sim.coords = FALSE,
regionsize = 0.7,
wnd = c(-180, 180, -90, 90),
interactions = c(0, 1),
over.land = TRUE,
...
)
x |
Path to input VCF or a vcfR object. Default NULL. Used to extract probabilities that a site has a particular pair of Reference and Alternative nucleotides. If NULL 'probs' must be supplied. |
save.as |
Null or a character string where the output VCF containing the simulated dataset should be saved. |
RA.probs |
Either NULL (the default), one of the character strings "equal" or "empirical", or a numerical vector with probability of each combination of reference and alternative allele. Numerical vectors must have length 12 and sum to 1; if elements of the vector are unnamed, assumed names will be c("AC","AG","AT","CA","CG","CT","GA","GC","GT","TA","TC","TG"), with the first character specifying the reference allele and the second the alternative allele. If RA.probs is NULL it is coerced to 'empirical' if 'x' is non-NULL, or coerced to 'equal' if 'x' is NULL. Setting RA.probs to 'equal' is equivalent to the vector produced by sapply(c("AC","AG","AT","CA","CG","CT","GA","GC","GT","TA","TC","TG"),assign,(1/12)). If RA.probs="empirical", the frequencies of each Reference:Alternative allele combination is calculated for the input dataset and used for sampling probabilities for the simulated dataset. |
n.ind |
Number of individuals to include in simulated dataset. Default NULL, in which case the number of simulated individuals will match the number of individuals in 'x'. |
n.snps |
Number of SNPs. If n.snp.nonstruc is NULL and x is non-NULL, n.snp.nonstruc will equal half the number of SNPs in the input dataset. |
snp.str |
Fraction of SNPs that are structured WITHIN? populations. Default 0. Increasing more than 0 seems generate at least twice as many simulated populations as specified with K. |
ploidy |
Number indicating ploidy of individuals. Probably can only be 1 or 2? |
K |
Number of populations (>=2). Default 2. |
include.missing |
Logical indicating whether or not missing data should be included in the simulated dataset. |
fMD |
NULL or a number in the range (0,1] indicating the fraction of genotypes that should be missing in the simulated dataset. If NULL (the default) and 'include.missing'=TRUE and 'x' non-NULL, the fraction of genotypes missing in the simulated dataset will be approximately equal to the fraction of missing genotypes in the input dataset. |
pMDi |
NULL (the default) or a numerical vector with the proportion of missing data that should be attributed to each simulated individual. If NULL, ignored unless 'x' is non-NULL and 'include.missing'=TRUE, in which case pMDi values will be sampled from missing data proportions of the input VCF, producing similar missing data structure for input and simulated datasets. |
LD |
NULL or a logical (TRUE or FALSE) indicating whether or not snps should be simulated under linkage disequilibrium. Default FALSE. If LD='NULL' (will be the default in future versions) and 'x' is non-NULL, LD is coerced to FALSE if all sites (positions) of the input dataset are unlinked (on different blocks/chromosomes), otherwise LD is coerced to TRUE. |
block.minsize |
Number with minimum. Ignored if 'LD' is FALSE or coerced to FALSE. Default 10. |
block.maxsize |
Number indicating the maximum size of linkage blocks. Ignored if 'LD' is FALSE or coerced to FALSE, or if 'use.maxLDx' is TRUE and 'x' is non-NULL, in which case the max block size is equal to the max position of any snps in a linkage block). Default 1000. |
use.maxLDx |
Logical indicating whether or not the maximum linkage block size should be set as the max position of any snp within a linkage block of the input data. Default FALSE, which means that max linkage block size should be the value of 'block.maxsize'. |
sim.coords |
Logical indicating if a simulated coordinates (sample localities) should be produced for each simulated individual. Default FALSE. |
regionsize |
Number between 0 and 1 specifying the fraction of the possible sampling area (the window region set by 'wnd' argument) in which points may be distributed. Default 0.25. When 'wnd' is default (entire Earth), regionsize is default (0.25), and ('over.land' condition is FALSE), then minimum convex hull of all sampled points is expected to cover 1/4 of Earth. |
wnd |
Either a character string or vector describing one or more regions of Earth, or a length four numerical vector specifying the bounding box (longitude and latitude ranges) for the region where sampling is allowed. Default is to include all of Earth c(-180,180,-90,90). |
interactions |
Numeric vector with the minimum and maximum amount of overlap between each pair of groups (aka populations), calculated as (intersect area)/(minimum of non-intersected area for each group). Default c(0,1) allows for all possible scenarios. Examples: c(0,0) specifies that groups must be allopatric; c(1,1) requires complete overlap of groups, which is not realistic given the stochasticity determining region sizes; c(0.5,1) requires that at least half-overlaps between groups; c(0.2,0.25) specifies a small contact zone. |
over.land, |
and 'interactions' |
... |
Additional arguments passed to 'adegenet::glSim' to control SNPs simulation, or to 'misc.wrappers::rcoords' to further control the simulation of geographic localities when 'sim.coords' is TRUE. Possible 'glSim' arguments include 'grp.size', 'pop.freq', 'alpha', 'parallel', and 'theta' (see ?adegenet::glSim for details). |
n.snp.nonstruc |
Number of nonstructured SNPs. If n.snp.nonstruc is NULL and x is non-NULL, n.snp.nonstruc will equal half the number of SNPs in the input dataset. |
n.snp.struc |
Number of structured SNPs. If n.snp.nonstruc is NULL and x is non-NULL, 'n.snp.nonstruc' will equal the number of SNPs in the input dataset minus the value of 'n.snp.nonstruc'. Meaningless if K = 1. |
An object with class vcfR (see 'vcfR' package for details regarding this class)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.