fsc.bootstraps | R Documentation |
Once fsc estimated parameter values, this function uses the *_maxL.par to
simulate nSim
datasets with the model parameters. It then uses these
data to estimate the parameter values and confidence intervals are finally
returned using the R package boot
.
fsc.bootstraps(
dir.in,
nLoci = 10000,
nSim = 100,
maf = TRUE,
ncpu = 0,
nBatches = NULL,
fsc.cmd = "fsc2702",
fsc.path = "path",
par.indLoci = "200000 0",
par.nBlocks = 1,
par.data = "DNA 100 0 2.5e-8 0.33",
n = 1e+05,
L = 50,
nBoot = 1000,
conf = 0.95,
boot.type = "perc"
)
dir.in |
The directory where the analysis was conducted |
nLoci |
The number of polymorphic loci to retain |
nSim |
The number of datasets that need to be simulated |
maf |
Whether a MAF SFS (default) or a derived SFS is provided (if
|
ncpu |
The number of CPU (threads) to use in the analysis. Automatically
handle if |
nBatches |
The number of batches (-B option) |
fsc.cmd |
The command to use to call fsc (that may be different depending on the version installed) |
fsc.path |
The path where fsc is installed or |
par.indLoci |
The two integers value that need to be used for the number of independent loci in the .par that it is used to run the simulations. |
par.nBlocks |
The number of linkage blocks in the .par that it is used to run the simulations. |
par.data |
The string to be used in the 'per Block: data type, num loci,
rec. rate and mut rate + optional parameters' line of the .par that it is
used to run the simulations. The default value is |
n |
The number of coalescent simulations to approximate the expected SFS (-n option). This should be larger than 100,000. |
L |
The number of optimization cycles (-L option). It should be >50 |
nBoot |
The number of bootstrap replicates |
conf |
The confidence level to compute the confidence intervals |
boot.type |
The method to be used to compute the confidence intervals.
See |
It also uses the analyses fromt he simulated data to build an empirical
cumulative density function of the Composite Likelihood Ratio (CLR), to build
a statistical test for the fit of the model (See Excoffier et al 2013 for
details). That is, from the simulated data, it is possible to estimate the
probability that a randpm value from the null distribution is smaller or
greater than the observed CLR. It is important to note that the probability
values (P.Rand.less.Obs
and P.Rand.gt.Obs
) are constructed from
the simulated datasets, so a large enough number of simulations needs to be
run for these to be reliable. Bootstrapped percentiles are also reported (see
first item of the list returned as results), if this approach is preferred.
It is important that enough sites are simulated to ensure that sufficient
polymorphic loci are present in the simulated data. It is better to simulate
an excess of sites and retained those needed using nLoci
.
Initial values when estimating parameters from simulated datasets are passed using the .pv so that a reduced number of replicates need to be run.
For some reason, which is a mystery to me, sometimes there is a need to 'print' to screen twice to get the first element of the list to actually be visible on the screen.
A list with the following elements
Bootstr.stats: Descriptive statistics from bootstraps (Median, lower and upper limit), an the intial estimated parameters
P.Rand.less.Obs: The probability that a random value from the null Composite Likelihood Ratio distribution is less than the observed CLR
P.Rand.gt.Obs: The probability that a random value from the null Composite Likelihood Ratio distribution is greater than the observed CLR
Sim: The estimates from the simulated data
Excoffier L., Dupanloup I., Huerta-Sánchez E., Sousa V. C. and Foll M. (2013) Robust demographic inference from genomic and SNP data. PLoS genetics 9(10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.