HAC.simrep: Run a simulation of haplotype accumulation curves for...

Description Usage Arguments Value Note Examples

View source: R/HAC.simrep.R

Description

Runs the HACSim algorithm by successively calling HAC.sim to iteratively extrapolate haplotype accumulation curves to determine likely specimen sample sizes for hypothetical or real species

The algorithm employs the following iterative methods when calculating the "Measures of Sampling Closeness":

where H_i is stochastically-determined through sampling from probs, the observed species' haplotype frequency distribution vector.

As the algorithm proceeds, H_i will approach H^* asymptotically (and hence, N_i will converge to N^*), but will likely fluctuate randomly from one iteration to the next. However, estimates of N^* found at each iteration will be monotonically-increasing.

Usage

1
HAC.simrep(HACSObject)

Arguments

HACSObject

object containing the desired simulation parameters

Value

Iteration results are outputted to the console and graphs displayed in the plot window. Plots depict haplotype accumulation (along with shaded confidence intervals for the mean number of haplotypes found). Dashed lines correspond to the endpoint of the curve and reflect haplotype recovery for a user-defined cutoff (default p = 0.95, 95% haplotype diversity). Output from the first iteration is useful for judging levels of haplotype diversity and recovery found in observed intraspecific sequence datasets, reflecting current sampling depth. The required sample size is displayed in the second-last iteration. All other information corresponding to the extrapolated sample size can be found in the last iteration. Iteration results can optionally be saved to a CSV file. Subsampled DNA sequences are automatically saved to a FASTA file.

Note

When simulating real species via HACReal(...), a pop-up window will appear prompting the user to select an intraspecific FASTA file of aligned/trimmed DNA sequences. The alignment must not contain missing or ambiguous nucleotides (i.e., it should only contain A, C, G or T); otherwise, haplotype diversity may be overestimated. Excluding sequences or alignment sites with missing/ambiguous data is an option.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
  ## Simulate hypothetical species ##
  
  N <- 100 # total number of sampled individuals
  Hstar <- 10 # total number of haplotypes
  probs <- rep(1/Hstar, Hstar) # equal haplotype frequency distribution
  
  HACSObj <- HACHypothetical(N = N, Hstar = Hstar , probs = probs, 
  filename = "output") # outputs a CSV file called "output.csv"
  
  ## Simulate hypothetical species - subsampling ##
  HACSObj <- HACHypothetical(N = N, Hstar = Hstar, probs = probs, 
  perms = 1000, p = 0.95, subsample = TRUE, prop = 0.25, 
  conf.level = 0.95, filename = "output")
  
  ## Simulate hypothetical species and all paramaters changed - subsampling ##
  HACSObj <- HACHypothetical(N = N, Hstar = Hstar, probs = probs, 
  perms = 10000, p = 0.90, subsample = TRUE, prop = 0.15, conf.level = 0.95, 
  filename = "output")
  
  HAC.simrep(HACSObj) # runs a simulation
  
  ## Simulate real species ##
  
  
    ## Simulate real species ##
    # outputs file called "output.csv"
    HACSObj <- HACReal(filename = "output") 
    
    ## Simulate real species - subsampling ##
    HACSObj <- HACReal(subsample = TRUE, prop = 0.15, conf.level = 0.95, 
    filename = "output")
    
    ## Simulate real species and all parameters changed - subsampling ##
    HACSObj <- HACReal(perms = 10000, p = 0.90, subsample = TRUE, 
    prop = 0.15, conf.level = 0.99, filename = "output")
    
    # user prompted to select appropriate FASTA file
    HAC.simrep(HACSObj) 
    

jphill01/HACSim.R documentation built on Jan. 7, 2021, 3:04 a.m.