simFS | R Documentation |
Simulate genotypes and sequencing depth for the progeny of a full-sib family
simFS(
rVec_f,
rVec_m = rVec_f,
epsilon = 0,
config,
nInd = 100,
meanDepth = 5,
thres = NULL,
rd_dist = "Neg_Binom",
seed1 = 1,
seed2 = 1,
MNIF = 1
)
rVec_f , rVec_m |
Numeric vector of true paternal and maternal recombination fractions (in the interval [0,0.5]). Currently, only a single value is allowed making the rf across all the loci the same. |
epsilon |
Numeric value of the sequencing error rate. |
config |
Nested list containing the config vector for each family and chromosome. See details on how to sepcify this correctly. |
nInd |
Positive integer vector for the number of individuals in each family for the simulated data. The length of the list gives the number of families in the simulated data set. |
meanDepth |
Positive numeric value for the mean depth of the read depth distribution. |
thres |
Numeric value for the threshold value for which genotype calls with a read depth less than the threshold are set to missing. |
rd_dist |
Character value for the distribution for which the read depths are
simulated from. Currently, only negative binomial ( |
seed1 |
Numeric value. Random seed used for the simulation of the parental phase (or OPGP). |
seed2 |
Numeric value. Random seed used for the simulation of the data sets. |
The simFS
function simulates sequencing data for the progeny of a
full-sib family according to the simulation parameters assuming no
interference. The process for the simulation is as follows;
Inheritance vectors are simulated according to the true recombination fractions, assuming no interference.
The inheritance vectors are converted to genotypes for the specified
OPGP vector (which is simulated based on the config
vector).
Simulated read depths are generated by simulating realizations from
the read depth distribution. The alleles of the true genotype are then
sampled with equal probability and replacement until a sample size
corresponding to the read depth is obtained, where a sequencing error
(e.g. an reference allele called as alternate and vice versa) is simulated
to occur with probability epsilon
. The mean read depth for each loci
is assumed to be equal.
The simulation can be performed with multiple families and/or multiple chromosomes.
Specifying the config
argument in the correct is crucial and must be a nested list.
The number of elements of the list at the top level gives the number of families (which must be the
same as the length of the nInd
argument) and
the number elements of the list at the second level gives the number of chromosomes.
The elements at the bottom level of the list must be vectors of integers 1 to 9 where 1=both-informative (ABxAB),
2=paternal-informative A (ABxAA), 3=paternal-informative B (ABxBB),
4=maternal-informative (AAxAB), 5=maternal-informative (BBxAB), 6 =
uninformative (AAxAA), 7 = uninformative (AAxBB), 8 = uninformative (BBxAA)
and 9 = uninformative (BBxBB). The list must be set up such that the length of each
chromosome whitin each family must be the same (see the examples for an ideal of how to set this up).
An FS object containing the simulated data.
Timothy P. Bilton
FS
## simulate a single full sib family with one chromosome
config <- list(list(c(2,1,1,4,2,4,1,1,4,1,2,1)))
F1data <- simFS(0.001, config=config, nInd=50, meanDepth=5)
## to look at the simulated data
F1data
## Simulate mulitple families and chromosomes
config <- list(replicate(2, sample(c(1,2,4), size=10, replace=TRUE, prob=c(1,2,2)), simplify = FALSE),
replicate(2, sample(c(1,2,4), size=10, replace=TRUE, prob=c(1,2,2)), simplify = FALSE))
F1data <- simFS(0.001, config=config, nInd=c(50,45), meanDepth=5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.