sim.parent.assign.fun: sim.parent.assign.fun
In mghamilton/SNPpools: Maximum likelihood parentage assignment using quantitative genotypes

Description Usage Arguments Value References Examples

This function adopts a stochastic simulation approach to determine the proportion of correct assignments and, for maximum likelihood approaches, the critical delta LOD values. For each repetition, 'snp.dat.indiv', 'snp.dat.pools', 'snp.param.indiv', 'snp.param.pools', 'fam.set.combns' and 'fam.set.combns.by.pool' data frames for one pooled DNA sample are generated from user-defined 'ped', 'map', 'true.snp.param.indiv' and 'sim.fam.sets' data frames. Parentage is assigned for each simulated pool using the parent.assign.fun.

sim.parent.assign.fun(
  n_repetitions,
  ped,
  map,
  missing.parents = NULL,
  true.snp.param.indiv,
  sim.fam.sets = NULL,
  method,
  beta.min.ss = FALSE,
  discrete.method = "geno.probs",
  threshold.indiv = NULL,
  threshold.pools = NULL,
  n.in.pools,
  min.intensity = 0,
  snp.error.assumed = NULL,
  snp.error.underlying = NULL,
  min.sd = 0,
  fams,
  skip.checks = FALSE
)

`n_repetitions`	is a integer variable defining the number of repetitions in the simulation
`ped`	is a data frame and a conventional pedigree file with one additional column. It must include all SAMPLE_IDs used to construct snp.param.indiv and all possible parents of pooled samples. It contains the following headings (class in parentheses): 'SAMPLE_ID' is the individual (i.e. not a pooled sample) sample identifier. Individuals with no true SAMPLE_ID should be assigned a dummy SAMPLE_ID.(integer). 'SIRE_ID' is the SAMPLE_ID of the sire (0 if unknown) (integer). 'DAM_ID' is the SAMPLE_ID of the dam (0 if unknown) (integer). 'SAMPLED' if TRUE individual used to generate snp.param.indiv (logical).
`map`	is a data frame and genetic map identifying the position of SNP. 'CHROMOSOME' is the chromosome number. To assume that SNP are not linked provided a unique CHROMOSOME number for each SNP_ID (integer). 'SNP_ID' is the SNP identifier (ordered by physical position within chromosome) (character). 'GENETIC_POSITION' is the SNP genetic position in Morgans (numeric). To assume that SNP are not linked make all GENETIC_POSITION = 0 (numeric). 'B_ALLELE_FREQ' is the frequency of the B allele in the population. 'ERROR_RATE' is the SNP error rate (i.e. the proportion of individuals/pools with signal intensity data from a random genotype rather than the true genotype for the SNP_ID). Refer to Hamilton 2020 (numeric). 'PROP_MISS' is the proportion missing data for the SNP_ID (numeric).
`missing.parents`	is a vector idenifying parents with no SNP data (i.e. known missing parents). Samples/individuals in missing.parents must be present as a SIRE_ID or a DAM_ID in ped
`true.snp.param.indiv`	is a data frame detailing the assumed SNP parameters of the population with the following headings (class in parentheses): 'SNP_ID' is the SNP identifier (character). 'MEAN_P_AA' is the mean of allelic proportion for homozygous A genotypes (numeric). 'SD_P_AA' is the standard deviation of allelic proportion for homozygous A genotypes (numeric). 'MEAN_P_AB' is the mean of allelic proportion for heterozygous (AB) genotypes (numeric). 'SD_P_AB' is the standard deviation of allelic proportion for heterozygous (AB) genotypes (numeric). 'MEAN_P_BB' is the mean of allelic proportion for homozygous B genotypes (numeric). 'SD_P_BB' is the standard deviation of allelic proportion for homozygous B genotypes (numeric). 'A_ALLELE' is the base represented by allele A (i.e. 'A', 'C', 'G' or 'T') (character). 'B_ALLELE' is the base represented by allele B (i.e. 'A', 'C', 'G' or 'T') (character).
`sim.fam.sets`	is a data frame with the following headings (class in parentheses). Note: if sim.fam.sets = NULL (see example below with n.in.pools = 8), FAMILY_ID is taken from the 'fams' and duplicated n.in.pools times, FAM_SET_ID = 1 for the first duplication of FAMILY_IDs, 2 for the second etc and PROBABILITY = NA (default = NULL): 'FAM_SET_ID' is the family set identifier (integer). A 'family set' is a group of families of which one is known to be the true family of one of the individuals in a pooled sample. Within each 'family set combination' there must be a 'family set' for each individual in a pooled sample (i.e. if n.in.pools = 2 there must be two family sets in each family set combination) 'FAMILY_ID' is the family identifier (integer). 'PROBABILITY' is probability that an individual from this family is represented in the pooled sample. If all are NA it is assumed that the probability is equal for each family within the family set.
`method`	is a vector of methods to be implemented (e.g. c("Quantitative", "Discrete", "Exclusion", "Least_squares"))
`beta.min.ss`	is a logical variable appicable to least_squares method only (default = FALSE). If TRUE, the sum of squares of all parental combinations are computed and the combination with the minimum value is identified. Refer to Hamilton 2020.
`discrete.method`	is a character variable applicable to the "Discrete" or "Exclusion" methods only (default = "geno.probs"). It must equal either: "geno.probs" in which case discrete genotypes for parents and pools are derived from genotype probabilities. "assigned.genos" in which case discrete genotypes for parents and pools are obtained directly from the snp.dat.indiv and snp.dat.pools inputs.
`threshold.indiv`	is a numeric variable between 0 and 1 inclusive applicable to the "Discrete" or "Exclusion" methods only when discrete.method = "geno.probs" (default = NULL). A discrete genotype is assigned to the the most likely genotype in the quantitative ordered genotype probability matrix Gij if it is greater than threshold.indiv (or threshold.indiv / 2 for the two heterozygous genotypes). Otherwise the genotype is deemed missing (refer to the left hand side of page 5 of Henshall et al. 2014)
`threshold.pools`	is a numeric variable between 0 and 1 inclusive applicable to the "Discrete" or "Exclusion" methods only when discrete.method = "geno.probs" (default = NULL). Equivalent to threshold.indiv for pooled DNA samples.
`n.in.pools`	is an integer variable representing the number of individual that contributed DNA to each sample in snp.dat.pools
`min.intensity`	is a numeric variable (default = 0). If the square root of the sum of INTENSITY_A squared and INTENSITY_B squared in snp.dat.indiv or snp.dat.pools is less than min.intensity then this record is excluded. That is, observations that fall into an arc with a radius equal to min.intensity in the lower left of signal intensity scatter plots are excluded.
`snp.error.assumed`	Must be one of (default = NULL): NULL. Note that if snp.error.assumed is NULL then snp.error.underlying must not be NULL. a numeric variable between 0 and 1, in which case the 'assumed error rate' (see Henshall et al 2014) is the same across all SNP. a data frame with columns SNP_ID and SNP_ERROR_TILDE (see Henshall et al 2014).
`fams`	is a data frame with the following headings (class in parentheses): 'FAMILY_ID' is the family identifier (integer). 'SIRE_ID' is the sire identifier (integer). 'DAM_ID' is the dam identifier (integer).
`skip.checks`	is a logical variable. If FALSE parent.assign.fun data checks are not undertaken.
`snp.error.underlying.`	Not used if snp.error.assumed is not NULL (default = NULL). Must be either: NULL. a numeric variable between 0 and 1 inclusive. Used to comptute SNP_ERROR_TILDE from SNP_ERROR_HAT according to the approach outlined on the left of page 5 of Henshall et al. 2014 using individual (i.e. not pooled) data only. If snp.error.underlying = 0 then SNP_ERROR_TILDE = SNP_ERROR_HAT.
`min.sd:`	a numeric variable defining a lower bound to be applied to estimates of the standard deviation of allelic proportion for genotypes in snp.param.indiv and snp.param.pools (default = 0)

'summary' is a data frame containing a summary of simulated pedigree assignments:

'METHOD' is the method implemented.
'PARENTS_TO_ASSIGN' is a count of uncertain parents for which assignments were attempted.
PROP_CORRECT_ASSIGN' is the proportion of PARENTS_TO_ASSIGN assigned correctly.
'CRIT_DELTA_0.950' the delta LOD above which 95 percent of assignments were correct. Applicable to maximum likelihood methods only.
'CRIT_DELTA_0.990' the delta LOD above which 99 percent of assignments were correct. Applicable to maximum likelihood methods only.
'CRIT_DELTA_0.995' the delta LOD above which 99.5 percent of assignments were correct. Applicable to maximum likelihood methods only.

ggplot.log.quant:

is a ggplot object (histogram) of delta LOD values using the 'Quantitative' method (if applicable).

ggplot.log.discrete:

is a ggplot object (histogram) of delta LOD values using the Discrete' method (if applicable).

'quant.sim.out' is a detailed summary for the 'Quantitative' method (refer to Hamilton 2020):

'TRUE_ID' is the true parent identifier
'REP' is the simualtion repetition number
'PARENT_NUMBER' is a unique parent identifier within REP
'QUANT_ID' is the assigned parent identifier using the 'Quantitative' method
'QUANT_DELTA_LOD' is the delta LOD (refer to Hamilton 2020)
'QUANT_LOD' is the LOD (refer to Hamilton 2020)
'CORRECT_ASSIGN' is TRUE if the parent was correctly assigned
'CUM_INCORRECT_ASSIGN' is a cumulative count of incorrectly assigned parents
'CUM_PROP_CORRECT_ASSIGN' is cumulative proportion of correctly assigned parents

'discrete.sim.out' is a detailed summary for the 'Discrete' method (refer to Hamilton 2020):

'TRUE_ID' is the true parent identifier.
'REP' is the simualtion repetition number.
'PARENT_NUMBER' is a unique parent identifier within REP.
'DISCRETE_ID' is the assigned parent identifier using the 'Discrete' method.
'DISCRETE_DELTA_LOD' is the delta LOD.
'DISCRETE_LOD' is the LOD.
'CORRECT_ASSIGN' is TRUE if the parent was correctly assigned.
'CUM_INCORRECT_ASSIGN' is a cumulative count of incorrectly assigned parents.
'CUM_PROP_CORRECT_ASSIGN' is cumulative proportion of correctly assigned parents.

'exclusion.sim.out' is a detailed summary for the 'Exclusion' method (refer to Hamilton 2020):

'TRUE_ID' is the true parent identifier.
'REP' is the simualtion repetition number.
'PARENT_NUMBER' is a unique parent identifier within REP.
'EXCLUSION_ID' is the assigned parent identifier using the 'Exclusion' method.
'CORRECT_ASSIGN' is TRUE if the parent was correctly assigned.

'ls.sim.beta.constrain.out' is a detailed summary for the 'least squares' method where beta hat is constrained to equal 1/n.in.pools within each FAM_SET_ID (refer to Hamilton 2020):

'TRUE_ID' is the true parent identifier.
'REP' is the simualtion repetition number.
'PARENT_NUMBER' is a unique parent identifier within REP.
'LS_ID' is the assigned parent identifier using the 'least squares' method with beta hat constrained.
'CORRECT_ASSIGN' is TRUE if the parent was correctly assigned.

'ls.sim.min.ss.out' is a detailed summary for the 'least squares' method where the family combination with the lowest sum of squares is identified (refer to Hamilton 2020):

'TRUE_ID' is the true parent identifier.
'REP' is the simualtion repetition number.
'PARENT_NUMBER' is a unique parent identifier within REP.
'LS_ID' is the assigned parent identifier using the 'least squares' method with the lowest sum of squares is identified.
'CORRECT_ASSIGN' is TRUE if the parent was correctly assigned.

Henshall JM, Dierens, L Sellars MJ (2014) Quantitative analysis of low-density SNP data for parentage assignment and estimation of family contributions to pooled samples. Genetics Selection Evolution 46, 51. https://doi 10.1186/s12711-014-0051-y

Hamilton MG (2020) Maximum likelihood parentage assignment using quantitative genotypes

#Retrieve data for 'pooling by phenotype' example from Hamilton 2020
data(shrimp.ped)
data(shrimp.map)
data(shrimp.true.snp.param.indiv)
data(shrimp.sim.fam.sets)
data(shrimp.fams)

#Run simulation for all methods with n.in.pools = 2.  Note that 3 is not enough repetitions (1000 may be).
sim.parent.assign.fun(n_repetitions = 3, 
                      ped = shrimp.ped,
                      map = shrimp.map,
                      true.snp.param.indiv = shrimp.true.snp.param.indiv,
                      sim.fam.sets = shrimp.sim.fam.sets, # equivalent to sim.fam.sets = NULL in this case
                      method = c("Quantitative", "Discrete", "Exclusion", "Least_squares"),     
                      beta.min.ss = TRUE, 
                      discrete.method = "geno.probs",   
                      threshold.indiv = 0.98,              
                      threshold.pools = 0.98, 
                      n.in.pools = 2,                
                      snp.error.assumed = 0.01,        
                      fams = shrimp.fams
)

#Run simulation using "Least_squares" method (beta.min.ss = FALSE) with n.in.pools = 8.  
#Do not attempt large pool sizes using any other method nor with beta.min.ss = TRUE, as your
#computer is likely to say no.  Note that 3 is not enough repetitions but is okay as an example.
sim.parent.assign.fun(n_repetitions = 3, 
                      ped = shrimp.ped,
                      map = shrimp.map,
                      true.snp.param.indiv = shrimp.true.snp.param.indiv,
                      sim.fam.sets = NULL, #shrimp.sim.fam.sets only appropriate for n.in.pools = 2
                      method = "Least_squares",     
                      beta.min.ss = FALSE,  
                      n.in.pools = 8,                
                      snp.error.assumed = 0.01,        
                      fams = shrimp.fams
)

mghamilton/SNPpools documentation built on Feb. 13, 2021, 12:52 a.m.

mghamilton/SNPpools index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mghamilton/SNPpools
Maximum likelihood parentage assignment using quantitative genotypes

sim.parent.assign.fun: sim.parent.assign.fun
In mghamilton/SNPpools: Maximum likelihood parentage assignment using quantitative genotypes

Description

Usage

Arguments

Value

References

Examples

Related to sim.parent.assign.fun in mghamilton/SNPpools...

R Package Documentation

Browse R Packages

We want your feedback!

mghamilton/SNPpools Maximum likelihood parentage assignment using quantitative genotypes

sim.parent.assign.fun: sim.parent.assign.fun In mghamilton/SNPpools: Maximum likelihood parentage assignment using quantitative genotypes

Description

Usage

Arguments

Value

References

Examples

Related to sim.parent.assign.fun in mghamilton/SNPpools...

R Package Documentation

Browse R Packages

We want your feedback!

mghamilton/SNPpools
Maximum likelihood parentage assignment using quantitative genotypes

sim.parent.assign.fun: sim.parent.assign.fun
In mghamilton/SNPpools: Maximum likelihood parentage assignment using quantitative genotypes