Description Usage Arguments Details Value References See Also Examples
View source: R/sim_StudySeqFunctions.R
Simulate single-nucleotide variant (SNV) data for a sample of pedigrees.
1 2 3 |
ped_files |
Data frame. A data frame of pedigrees for which to simulate sequence data, see details. |
SNV_data |
SNVdata. An object of class |
affected_only |
Logical. When |
remove_wild |
Logical. When |
pos_in_bp |
Logical. This argument indicates if the positions in |
gamma_params |
Numeric list of length 2. The respective shape and rate parameters of the gamma distribution used to simulate distance between chiasmata. By default, |
burn_in |
Numeric. The "burn-in" distance in centiMorgan, as defined by Voorrips and Maliepaard (2012), which is required before simulating the location of the first chiasmata with interference. By default, |
SNV_map |
This argument has been deprecated. Users now supply objects of class |
haplos |
This argument has been deprecated. Users now supply objects of class |
The sim_RVstudy
function is used to simulate single-nucleotide variant (SNV) data for a sample of pedigrees. Please note: this function is NOT appropriate for users who wish to simulate genotype conditional on phenotype. Instead, sim_RVstudy
employs the following algorithm.
For each pedigree, we sample a single causal rare variant (cRV) from a pool of SNVs specified by the user.
Upon identifying the familial cRV we sample founder haplotypes from haplotype data conditional on the founder's cRV status at the familial cRV locus.
Proceeding forward in time, from founders to more recent generations, for each parent/offspring pair we:
simulate recombination and formation of gametes, according to the model proposed by Voorrips and Maliepaard (2012), and then
perform a conditional gene drop to model inheritance of the cRV.
It is important to note that due to the forwards-in-time algorithm used by sim_RVstudy
, certain types of inbreeding and/or loops cannot be accommodated. Please see examples.
For a detailed description of the model employed by sim_RVstudy
, please refer to section 6 of the vignette.
The data frame of pedigrees, ped_files
, supplied to sim_RVstudy
must contain the variables:
name | type | description |
FamID | numeric | family identification number |
ID | numeric | individual identification number |
sex | numeric | sex identification variable: sex = 0 for males, and sex = 1 females. |
dadID | numeric | identification number of father |
momID | numeric | identification number of mother |
affected | logical | disease status indicator: set affected = TRUE if individual has disease. |
DA1 | numeric | paternally inherited allele at the cRV locus: |
DA1 = 1 if the cRV is inherited, and 0 otherwise. |
||
DA2 | numeric | maternally inherited allele at the cRV locus: |
DA2 = 1 if the cRV is inherited, and 0 otherwise. |
||
If ped_files
does not contain the variables DA1
and DA2
the pedigrees are assumed to be fully sporadic. Hence, the supplied pedigrees will not segregate any of the SNVs in the user-specified pool of cRVs.
Pedigrees simulated by the sim_RVped
and sim_ped
functions of the SimRVPedigree
package are properly formatted for the sim_RVstudy
function. That is, the pedigrees generated by these functions contain all of the variables required for ped_files
(including DA1
and DA2
).
The data frame SNV_map
catalogs the SNVs in haplos
. The variables in SNV_map
must be formatted as follows:
name | type | description |
colID | numeric | associates the rows in SNV_map to the columns of haplos |
chrom | numeric | the chromosome that the SNV resides on |
position | numeric | is the position of the SNV in base pairs when argument |
pos_in_bp = TRUE or centiMorgan when pos_in_bp = FALSE |
||
marker | character | (Optional) a unique character identifier for the SNV. |
If missing this variable will be created from chrom and position . |
||
pathwaySNV | logical | (Optional) identifies SNVs located within the pathway of interest as TRUE |
is_CRV | logical | identifies causal rare variants (cRVs) as TRUE . |
Please note that when the variable is_CRV
is missing from SNV_map
, we sample a single SNV to be the causal rare variant for all pedigrees in the study, which is identified in the returned famStudy
object.
A object of class famStudy
. Objects of class famStudy
are lists that include the following named items:
|
A data frame containing the sample of pedigrees for which sequence data was simulated. |
|
A sparse matrix that contains the simulated haplotypes for each pedigree member in |
|
A data frame that maps the haplotypes (i.e. rows) in |
|
A data frame cataloging the SNVs in |
Objects of class famStudy
are discussed in detail in section 5.2 of the vignette.
Roeland E. Voorrips and Chris A Maliepaard. (2012). The simulation of meiosis in diploid and tetraploid organisms using various genetic models. BMC Bioinformatics, 13:248.
Christina Nieuwoudt, Angela Brooks-Wilson, and Jinko Graham. (2019). SimRVSequences: an R package to simulate genetic sequence data for pedigrees. <doi:10.1101/534552>.
sim_RVped
, read_slim
, summary.famStudy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | library(SimRVSequences)
#load pedigree, haplotype, and mutation data
data(study_peds)
data(EXmuts)
data(EXhaps)
# create variable 'is_CRV' in EXmuts. This variable identifies the pool of
# causal rare variants from which to sample familial cRVs.
EXmuts$is_CRV = FALSE
EXmuts$is_CRV[c(26, 139, 223, 228, 472)] = TRUE
# create object of class SNVdata
my_SNVdata <- SNVdata(Haplotypes = EXhaps,
Mutations = EXmuts)
#supply required inputs to the sim_RVstudy function
seqDat = sim_RVstudy(ped_files = study_peds,
SNV_data = my_SNVdata)
# Inbreeding examples
# Due to the forward-in-time model used by sim_RVstudy certain types of
# inbreeding and/or loops *may* cause fatal errors when using sim_RVstudy.
# The following examples demonstrate: (1) imbreeding that can be accommodated
# under this model, and (2) when this limitation is problematic.
# Create inbreeding in family 1 of study_peds
imb_ped1 <- study_peds[study_peds$FamID == 3, ]
imb_ped1[imb_ped1$ID == 18, c("momID")] = 7
plot(imb_ped1)
# Notice that this instance of inbreeding can be accommodated by our model.
seqDat = sim_RVstudy(ped_files = imb_ped1,
SNV_data = my_SNVdata)
# Create different type of inbreeding in family 1 of study_peds
imb_ped2 <- study_peds[study_peds$FamID == 3, ]
imb_ped2[imb_ped1$ID == 8, c("momID")] = 18
plot(imb_ped2)
# Notice that inbreeding in imb_ped2 will cause a fatal
# error when the sim_RVstudy function is executed
## Not run:
seqDat = sim_RVstudy(ped_files = imb_ped2,
SNV_data = my_SNVdata)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.