simFS: Simulation sequencing data for full-sib families.

View source: R/simulate.R

simFSR Documentation

Simulation sequencing data for full-sib families.

Description

Simulate genotypes and sequencing depth for the progeny of a full-sib family

Usage

simFS(
  rVec_f,
  rVec_m = rVec_f,
  epsilon = 0,
  config,
  nInd = 100,
  meanDepth = 5,
  thres = NULL,
  rd_dist = "Neg_Binom",
  seed1 = 1,
  seed2 = 1,
  MNIF = 1
)

Arguments

rVec_f, rVec_m

Numeric vector of true paternal and maternal recombination fractions (in the interval [0,0.5]). Currently, only a single value is allowed making the rf across all the loci the same.

epsilon

Numeric value of the sequencing error rate.

config

Nested list containing the config vector for each family and chromosome. See details on how to sepcify this correctly.

nInd

Positive integer vector for the number of individuals in each family for the simulated data. The length of the list gives the number of families in the simulated data set.

meanDepth

Positive numeric value for the mean depth of the read depth distribution.

thres

Numeric value for the threshold value for which genotype calls with a read depth less than the threshold are set to missing.

rd_dist

Character value for the distribution for which the read depths are simulated from. Currently, only negative binomial ("Neg_Binom") and Poisson ("Pois") are implemented.

seed1

Numeric value. Random seed used for the simulation of the parental phase (or OPGP).

seed2

Numeric value. Random seed used for the simulation of the data sets.

Details

The simFS function simulates sequencing data for the progeny of a full-sib family according to the simulation parameters assuming no interference. The process for the simulation is as follows;

  • Inheritance vectors are simulated according to the true recombination fractions, assuming no interference.

  • The inheritance vectors are converted to genotypes for the specified OPGP vector (which is simulated based on the config vector).

  • Simulated read depths are generated by simulating realizations from the read depth distribution. The alleles of the true genotype are then sampled with equal probability and replacement until a sample size corresponding to the read depth is obtained, where a sequencing error (e.g. an reference allele called as alternate and vice versa) is simulated to occur with probability epsilon. The mean read depth for each loci is assumed to be equal.

The simulation can be performed with multiple families and/or multiple chromosomes.

Specifying the config argument in the correct is crucial and must be a nested list. The number of elements of the list at the top level gives the number of families (which must be the same as the length of the nInd argument) and the number elements of the list at the second level gives the number of chromosomes. The elements at the bottom level of the list must be vectors of integers 1 to 9 where 1=both-informative (ABxAB), 2=paternal-informative A (ABxAA), 3=paternal-informative B (ABxBB), 4=maternal-informative (AAxAB), 5=maternal-informative (BBxAB), 6 = uninformative (AAxAA), 7 = uninformative (AAxBB), 8 = uninformative (BBxAA) and 9 = uninformative (BBxBB). The list must be set up such that the length of each chromosome whitin each family must be the same (see the examples for an ideal of how to set this up).

Value

An FS object containing the simulated data.

Author(s)

Timothy P. Bilton

See Also

FS

Examples


## simulate a single full sib family with one chromosome 
config <- list(list(c(2,1,1,4,2,4,1,1,4,1,2,1)))
F1data <- simFS(0.001, config=config, nInd=50, meanDepth=5)
## to look at the simulated data
F1data

## Simulate mulitple families and chromosomes
config <- list(replicate(2, sample(c(1,2,4), size=10, replace=TRUE, prob=c(1,2,2)), simplify = FALSE),
replicate(2, sample(c(1,2,4), size=10, replace=TRUE, prob=c(1,2,2)), simplify = FALSE))
F1data <- simFS(0.001, config=config, nInd=c(50,45), meanDepth=5)


tpbilton/GUSMap documentation built on Feb. 22, 2025, 12:27 p.m.