fsc.bootstraps: Compute bootstrap confidence intervals for estimated...
In carlopacioni/amplicR: An R package to process amplicon data

fsc.bootstraps

R Documentation

Compute bootstrap confidence intervals for estimated parameters with fsc

Description

Once fsc estimated parameter values, this function uses the *_maxL.par to simulate nSim datasets with the model parameters. It then uses these data to estimate the parameter values and confidence intervals are finally returned using the R package boot.

Usage

fsc.bootstraps(
  dir.in,
  nLoci = 10000,
  nSim = 100,
  maf = TRUE,
  ncpu = 0,
  nBatches = NULL,
  fsc.cmd = "fsc2702",
  fsc.path = "path",
  par.indLoci = "200000 0",
  par.nBlocks = 1,
  par.data = "DNA 100 0 2.5e-8 0.33",
  n = 1e+05,
  L = 50,
  nBoot = 1000,
  conf = 0.95,
  boot.type = "perc"
)

Arguments

`dir.in`	The directory where the analysis was conducted
`nLoci`	The number of polymorphic loci to retain
`nSim`	The number of datasets that need to be simulated
`maf`	Whether a MAF SFS (default) or a derived SFS is provided (if `FALSE`)
`ncpu`	The number of CPU (threads) to use in the analysis. Automatically handle if `ncpu=0` (default)
`nBatches`	The number of batches (-B option)
`fsc.cmd`	The command to use to call fsc (that may be different depending on the version installed)
`fsc.path`	The path where fsc is installed or `"path"` if it is in the PATH (that means that it can be called regardless of the working directory)
`par.indLoci`	The two integers value that need to be used for the number of independent loci in the .par that it is used to run the simulations.
`par.nBlocks`	The number of linkage blocks in the .par that it is used to run the simulations.
`par.data`	The string to be used in the 'per Block: data type, num loci, rec. rate and mut rate + optional parameters' line of the .par that it is used to run the simulations. The default value is `par.data="DNA 100 0 2.5e-8 0.33"`
`n`	The number of coalescent simulations to approximate the expected SFS (-n option). This should be larger than 100,000.
`L`	The number of optimization cycles (-L option). It should be >50
`nBoot`	The number of bootstrap replicates
`conf`	The confidence level to compute the confidence intervals
`boot.type`	The method to be used to compute the confidence intervals. See `?boot.ci` for details. By default, percentile intervals are computed.

Details

It also uses the analyses fromt he simulated data to build an empirical cumulative density function of the Composite Likelihood Ratio (CLR), to build a statistical test for the fit of the model (See Excoffier et al 2013 for details). That is, from the simulated data, it is possible to estimate the probability that a randpm value from the null distribution is smaller or greater than the observed CLR. It is important to note that the probability values (P.Rand.less.Obs and P.Rand.gt.Obs) are constructed from the simulated datasets, so a large enough number of simulations needs to be run for these to be reliable. Bootstrapped percentiles are also reported (see first item of the list returned as results), if this approach is preferred.

It is important that enough sites are simulated to ensure that sufficient polymorphic loci are present in the simulated data. It is better to simulate an excess of sites and retained those needed using nLoci.

Initial values when estimating parameters from simulated datasets are passed using the .pv so that a reduced number of replicates need to be run.

For some reason, which is a mystery to me, sometimes there is a need to 'print' to screen twice to get the first element of the list to actually be visible on the screen.

Value

A list with the following elements

Bootstr.stats: Descriptive statistics from bootstraps (Median, lower and upper limit), an the intial estimated parameters
P.Rand.less.Obs: The probability that a random value from the null Composite Likelihood Ratio distribution is less than the observed CLR
P.Rand.gt.Obs: The probability that a random value from the null Composite Likelihood Ratio distribution is greater than the observed CLR
Sim: The estimates from the simulated data

References

Excoffier L., Dupanloup I., Huerta-Sánchez E., Sousa V. C. and Foll M. (2013) Robust demographic inference from genomic and SNP data. PLoS genetics 9(10)

carlopacioni/amplicR documentation built on Aug. 19, 2023, 7:59 p.m.