Description Usage Arguments Value References Examples
View source: R/sim.parent.assign.fun.R
This function adopts a stochastic simulation approach to determine the proportion of correct assignments and, for maximum likelihood approaches, the critical delta LOD values. For each repetition, 'snp.dat.indiv', 'snp.dat.pools', 'snp.param.indiv', 'snp.param.pools', 'fam.set.combns' and 'fam.set.combns.by.pool' data frames for one pooled DNA sample are generated from user-defined 'ped', 'map', 'true.snp.param.indiv' and 'sim.fam.sets' data frames. Parentage is assigned for each simulated pool using the parent.assign.fun.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | sim.parent.assign.fun(
n_repetitions,
ped,
map,
missing.parents = NULL,
true.snp.param.indiv,
sim.fam.sets = NULL,
method,
beta.min.ss = FALSE,
discrete.method = "geno.probs",
threshold.indiv = NULL,
threshold.pools = NULL,
n.in.pools,
min.intensity = 0,
snp.error.assumed = NULL,
snp.error.underlying = NULL,
min.sd = 0,
fams,
skip.checks = FALSE
)
|
n_repetitions |
is a integer variable defining the number of repetitions in the simulation |
ped |
is a data frame and a conventional pedigree file with one additional column. It must include all SAMPLE_IDs used to construct snp.param.indiv and all possible parents of pooled samples. It contains the following headings (class in parentheses):
|
map |
is a data frame and genetic map identifying the position of SNP.
|
missing.parents |
is a vector idenifying parents with no SNP data (i.e. known missing parents). Samples/individuals in missing.parents must be present as a SIRE_ID or a DAM_ID in ped |
true.snp.param.indiv |
is a data frame detailing the assumed SNP parameters of the population with the following headings (class in parentheses):
|
sim.fam.sets |
is a data frame with the following headings (class in parentheses). Note: if sim.fam.sets = NULL (see example below with n.in.pools = 8), FAMILY_ID is taken from the 'fams' and duplicated n.in.pools times, FAM_SET_ID = 1 for the first duplication of FAMILY_IDs, 2 for the second etc and PROBABILITY = NA (default = NULL):
|
method |
is a vector of methods to be implemented (e.g. c("Quantitative", "Discrete", "Exclusion", "Least_squares")) |
beta.min.ss |
is a logical variable appicable to least_squares method only (default = FALSE). If TRUE, the sum of squares of all parental combinations are computed and the combination with the minimum value is identified. Refer to Hamilton 2020. |
discrete.method |
is a character variable applicable to the "Discrete" or "Exclusion" methods only (default = "geno.probs"). It must equal either:
|
threshold.indiv |
is a numeric variable between 0 and 1 inclusive applicable to the "Discrete" or "Exclusion" methods only when discrete.method = "geno.probs" (default = NULL). A discrete genotype is assigned to the the most likely genotype in the quantitative ordered genotype probability matrix Gij if it is greater than threshold.indiv (or threshold.indiv / 2 for the two heterozygous genotypes). Otherwise the genotype is deemed missing (refer to the left hand side of page 5 of Henshall et al. 2014) |
threshold.pools |
is a numeric variable between 0 and 1 inclusive applicable to the "Discrete" or "Exclusion" methods only when discrete.method = "geno.probs" (default = NULL). Equivalent to threshold.indiv for pooled DNA samples. |
n.in.pools |
is an integer variable representing the number of individual that contributed DNA to each sample in snp.dat.pools |
min.intensity |
is a numeric variable (default = 0). If the square root of the sum of INTENSITY_A squared and INTENSITY_B squared in snp.dat.indiv or snp.dat.pools is less than min.intensity then this record is excluded. That is, observations that fall into an arc with a radius equal to min.intensity in the lower left of signal intensity scatter plots are excluded. |
snp.error.assumed |
Must be one of (default = NULL):
|
fams |
is a data frame with the following headings (class in parentheses):
|
skip.checks |
is a logical variable. If FALSE parent.assign.fun data checks are not undertaken. |
snp.error.underlying. |
Not used if snp.error.assumed is not NULL (default = NULL). Must be either:
|
min.sd: |
a numeric variable defining a lower bound to be applied to estimates of the standard deviation of allelic proportion for genotypes in snp.param.indiv and snp.param.pools (default = 0) |
'summary' is a data frame containing a summary of simulated pedigree assignments:
'METHOD' is the method implemented.
'PARENTS_TO_ASSIGN' is a count of uncertain parents for which assignments were attempted.
PROP_CORRECT_ASSIGN' is the proportion of PARENTS_TO_ASSIGN assigned correctly.
'CRIT_DELTA_0.950' the delta LOD above which 95 percent of assignments were correct. Applicable to maximum likelihood methods only.
'CRIT_DELTA_0.990' the delta LOD above which 99 percent of assignments were correct. Applicable to maximum likelihood methods only.
'CRIT_DELTA_0.995' the delta LOD above which 99.5 percent of assignments were correct. Applicable to maximum likelihood methods only.
ggplot.log.quant:
is a ggplot object (histogram) of delta LOD values using the 'Quantitative' method (if applicable).
ggplot.log.discrete:
is a ggplot object (histogram) of delta LOD values using the Discrete' method (if applicable).
'quant.sim.out' is a detailed summary for the 'Quantitative' method (refer to Hamilton 2020):
'TRUE_ID' is the true parent identifier
'REP' is the simualtion repetition number
'PARENT_NUMBER' is a unique parent identifier within REP
'QUANT_ID' is the assigned parent identifier using the 'Quantitative' method
'QUANT_DELTA_LOD' is the delta LOD (refer to Hamilton 2020)
'QUANT_LOD' is the LOD (refer to Hamilton 2020)
'CORRECT_ASSIGN' is TRUE if the parent was correctly assigned
'CUM_INCORRECT_ASSIGN' is a cumulative count of incorrectly assigned parents
'CUM_PROP_CORRECT_ASSIGN' is cumulative proportion of correctly assigned parents
'discrete.sim.out' is a detailed summary for the 'Discrete' method (refer to Hamilton 2020):
'TRUE_ID' is the true parent identifier.
'REP' is the simualtion repetition number.
'PARENT_NUMBER' is a unique parent identifier within REP.
'DISCRETE_ID' is the assigned parent identifier using the 'Discrete' method.
'DISCRETE_DELTA_LOD' is the delta LOD.
'DISCRETE_LOD' is the LOD.
'CORRECT_ASSIGN' is TRUE if the parent was correctly assigned.
'CUM_INCORRECT_ASSIGN' is a cumulative count of incorrectly assigned parents.
'CUM_PROP_CORRECT_ASSIGN' is cumulative proportion of correctly assigned parents.
'exclusion.sim.out' is a detailed summary for the 'Exclusion' method (refer to Hamilton 2020):
'TRUE_ID' is the true parent identifier.
'REP' is the simualtion repetition number.
'PARENT_NUMBER' is a unique parent identifier within REP.
'EXCLUSION_ID' is the assigned parent identifier using the 'Exclusion' method.
'CORRECT_ASSIGN' is TRUE if the parent was correctly assigned.
'ls.sim.beta.constrain.out' is a detailed summary for the 'least squares' method where beta hat is constrained to equal 1/n.in.pools within each FAM_SET_ID (refer to Hamilton 2020):
'TRUE_ID' is the true parent identifier.
'REP' is the simualtion repetition number.
'PARENT_NUMBER' is a unique parent identifier within REP.
'LS_ID' is the assigned parent identifier using the 'least squares' method with beta hat constrained.
'CORRECT_ASSIGN' is TRUE if the parent was correctly assigned.
'ls.sim.min.ss.out' is a detailed summary for the 'least squares' method where the family combination with the lowest sum of squares is identified (refer to Hamilton 2020):
'TRUE_ID' is the true parent identifier.
'REP' is the simualtion repetition number.
'PARENT_NUMBER' is a unique parent identifier within REP.
'LS_ID' is the assigned parent identifier using the 'least squares' method with the lowest sum of squares is identified.
'CORRECT_ASSIGN' is TRUE if the parent was correctly assigned.
Henshall JM, Dierens, L Sellars MJ (2014) Quantitative analysis of low-density SNP data for parentage assignment and estimation of family contributions to pooled samples. Genetics Selection Evolution 46, 51. https://doi 10.1186/s12711-014-0051-y
Hamilton MG (2020) Maximum likelihood parentage assignment using quantitative genotypes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | #Retrieve data for 'pooling by phenotype' example from Hamilton 2020
data(shrimp.ped)
data(shrimp.map)
data(shrimp.true.snp.param.indiv)
data(shrimp.sim.fam.sets)
data(shrimp.fams)
#Run simulation for all methods with n.in.pools = 2. Note that 3 is not enough repetitions (1000 may be).
sim.parent.assign.fun(n_repetitions = 3,
ped = shrimp.ped,
map = shrimp.map,
true.snp.param.indiv = shrimp.true.snp.param.indiv,
sim.fam.sets = shrimp.sim.fam.sets, # equivalent to sim.fam.sets = NULL in this case
method = c("Quantitative", "Discrete", "Exclusion", "Least_squares"),
beta.min.ss = TRUE,
discrete.method = "geno.probs",
threshold.indiv = 0.98,
threshold.pools = 0.98,
n.in.pools = 2,
snp.error.assumed = 0.01,
fams = shrimp.fams
)
#Run simulation using "Least_squares" method (beta.min.ss = FALSE) with n.in.pools = 8.
#Do not attempt large pool sizes using any other method nor with beta.min.ss = TRUE, as your
#computer is likely to say no. Note that 3 is not enough repetitions but is okay as an example.
sim.parent.assign.fun(n_repetitions = 3,
ped = shrimp.ped,
map = shrimp.map,
true.snp.param.indiv = shrimp.true.snp.param.indiv,
sim.fam.sets = NULL, #shrimp.sim.fam.sets only appropriate for n.in.pools = 2
method = "Least_squares",
beta.min.ss = FALSE,
n.in.pools = 8,
snp.error.assumed = 0.01,
fams = shrimp.fams
)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.