updog_vanilla: Using Parental Data for Offspring Genotyping.
In dcgerard/updogAlpha: Using Parental Data for Offspring Genotyping

Description Usage Arguments Details Value Author(s) See Also

This function fits a hierarchical model to sequence counts from a collection of siblings — or a population of individuals in Hardy-Weinberg equilibrium — and returns genotyped information. The hierarchy comes from either the fact that they share the same parents or they come from a population in Hardy-Weinberg equilibrium. If you also have parental sequencing data, then you can include this to improve estimates.

updog_vanilla(ocounts, osize, ploidy, p1counts = NULL, p1size = NULL,
  p2counts = NULL, p2size = NULL, commit_num = 4, min_disp = 0,
  print_val = FALSE, update_outmean = FALSE, update_outdisp = FALSE,
  update_outprop = TRUE, update_bias_val = TRUE, update_seq_error = TRUE,
  update_od_param = TRUE, update_pgeno = TRUE, p1geno = 3, p2geno = 3,
  seq_error = 0.01, od_param = 0.01, bias_val = 1, out_prop = 0.001,
  out_mean = 0.5, out_disp = 1/3, non_mono_max = 2, bound_bias = FALSE,
  seq_error_mean = -4.7, seq_error_sd = 1, bias_val_mean = 0,
  bias_val_sd = 0.7, allele_freq = 0.5, model = c("f1", "s1", "hw",
  "uniform"), verbose = FALSE)

`ocounts`	A vector of non-negative integers. The ith element is the number of reads of the reference allele in the ith child.
`osize`	A vector of positive integers. The ith element is the total number of reads for the ith child.
`ploidy`	A positive integer. The number of copies of the genome in the species. This is the assumed to be the same for all individuals.
`p1counts`	A vector of non-negative integers. The ith element is the number of reads of the reference allele in the ith sample of parent 1. If `NULL` then the prior probabilities on parent 1's genotype will default to uniform.
`p1size`	A vector of positive integers. The ith element is the total number of reads in the ith sample of parent 1. If `NULL` then the prior probabilities on parent 1's genotype will default to uniform.
`p2counts`	A vector of non-negative integers. The ith element is the number of reads of the reference allele in the ith sample of parent 2. If `NULL` then the prior probabilities on parent 2's genotype will default to uniform.
`p2size`	A vector of positive integers. The ith element is the total number of reads in the ith sample of parent 2. If `NULL` then the prior probabilities on parent 2's genotype will default to uniform.
`commit_num`	The number of consecutive iterations where the parental genotypes do not change before we commit to those parental genotypes.
`min_disp`	The minimum value for the over-dispersion parameter of the outlier distribution. If this gets too small, then we can be overly confident in points being outliers.
`print_val`	A logical. Should we print the updates (`TRUE`) or not (`FALSE`)?
`update_outmean`	A logical. Should we update `out_mean` (`TRUE`) or not (`FALSE`)?
`update_outdisp`	A logical. Should we update `out_mean` (`TRUE`) or not (`FALSE`)?
`update_outprop`	A logical. Should we update `out_prop` (`TRUE`) or not (`FALSE`)?
`update_bias_val`	A logical. Should we update `bias_val` (`TRUE`) or not (`FALSE`)?
`update_seq_error`	A logical. Should we update `seq_error` (`TRUE`) or not (`FALSE`)?
`update_od_param`	A logical. Should we update `od_param` (`TRUE`) or not (`FALSE`)?
`update_pgeno`	A logical. Should we update `p1geno` and `p1geno` (`TRUE`) or not (`FALSE`)?
`p1geno`	The initial value of the first parental genotype.
`p2geno`	The initial value of the second parental genotype.
`seq_error`	A non-negative numeric. This is the known sequencing error rate. The default is to estimate this using data that are all approximately the reference allele.
`od_param`	The initial value of the overdispersion parameter.
`bias_val`	The initial value of the bias parameter.
`out_prop`	The initial value of the proportion of points that are outliers.
`out_mean`	The initial value of the mean of the outlier distribution.
`out_disp`	The initial value of the over-dispersion parameter of the outlier distribution.
`non_mono_max`	The maximum number of iterations to allow non-monotonicity of likelihood.
`bound_bias`	A logical. Should we bound `bias_val` by a somewhat arbitrary value (`TRUE`) or not (`FALSE`)?
`seq_error_mean`	The mean of the logit-normal prior on the sequencing error rate, which corresponds to `parvec[2]`. Set `seq_error_sd = Inf` to have no penalty on the sequencing error rate. The default is -4.7, which roughly corresponds to a mean sequencing error value of 0.009. If you want to constain `seq_error` to be zero, you need to set `update_seq_error = FALSE`, `seq_error = 0`, and`seq_error_mean = -Inf`.
`seq_error_sd`	The standard deviation of the logit-normal prior on the sequencing error rate, which corresponds to `parvec[2]`. The default is 1, which at three standard deviations is about a sequencing error rate of 0.15. Set `seq_error_sd = Inf` to have no penalty on the sequencing error rate.
`bias_val_mean`	The prior mean on the log of `bias_val` (corresponding to `parvec[1]`). Set `bias_val_sd = Inf` to have no penalty on the bias parameter.
`bias_val_sd`	The prior standard deviation on the log of `bias_val` (corresponding to `parvec[1]`). Set `bias_val_sd = Inf` to have no penalty on the bias parameter.
`allele_freq`	The allele frequency if `model = "hw"`.
`model`	The model for the genotype distribution. Do we assume an F1 population (`"f1"`), an S1 population (`"s1"`), Hardy-Weinberg equilibrium (`"hw"`), or a uniform distribution (`"uniform"`).
`verbose`	A logical. Should we output a lot more (`TRUE`) or not (`FALSE`)?

The key improvements in updog are its abilities to account for common features in GBS data: sequencing error rate, read-mapping bias, overdispersion, and outlying points.

A list of class updog with some or all of the following elements:

ogeno: A vector. Each element of which is the maximum a posteriori estimate of each individual's genotype.
maxpostprob: A vector. The maximum posterior probability of a genotype for each individual.
postmean: A vector. The posterior mean genotype for each individual.
bias_val: The estimated bias parameter. This is a value greater than 0 which is the ratio of the probability of correctly mapping a read containing the alternative allele to the probability of correctly mapping a read containing the reference allele. A value of 1 indicates no bias. A value less than one indicates bias towards the reference alelle. A value greater than 1 indiciates bias towards the alternative allele.
seq_error: The estimated sequencing error rate. This is between 0 and 1.
od_param: The estimated overdispersion parameter. Also known as the "intra-class correlation", this is the overdispersion parameter in the underlying beta of the beta-binomial distribution of the counts. Between 0 and 1, a value closer to 0 indicates less overdispersion and a value greater than 1 indicates greater overdispersion. In real data, we we typically see estimates between 0 and 0.01.
p1geno: The estimated genotype of one parent. The number of copies of the reference allele one of the parents has.
p1geno: The estimated genotype of the other parent. The number of copies of the reference allele the other parent has.
allele_freq: The estimated allele-frequency of the reference allele. This is the binomial proportion. Between 0 and 1, a value closer to 1 indicates a larger amount of reference alleles in the population.
out_prop: The estimated proportion of points that are outliers.
out_mean: The estimated mean of the outlier distribution. The outlier distribution is beta-binomial.
out_disp: The estimated overdispersion parameter of the outlier distribution. This is the "intra-class correlation" parameter of the beta-binomial outlier distribution.
prob_out: A vector. Each element of which is the posterior probability that a point is an outlier.
prob_ok: The posterior probability that a point is a non-outlier.
p1_prob_out: The posterior probability that parent 1 is an outlier.
p2_prob_out: The posterior probability that parent 2 is an outlier.
num_iter: The number of iterations the optimization program was run.
convergence: 1 if we reached maxiter and 0 otherwise.
llike: The final log-likelihood of the estimates.
hessian: The negative-Fisher information under the parameterization (s, ell, r), where s = log(bias_val) = log(d), ell = logit(seq_error) = logit(eps), and r = - logit(od_param) = - logit(tau). If you want standard errors for these parameters (in the described parameterization), simply take the negative inverse of the hessian.
input: A list with the input counts (ocounts), the input sizes (osize), input parental counts (p1counts and p2counts), input parental sizes (p2size and p1size), the ploidy (ploidy) and the model (model).
log_bias: The log of bias_val
logit_seq_error: The logit of seq_error. I.e. log(seq_error / (1 - seq_error))
neg_logit_od_param: The negative logit of od_param. I.e. log((1 - od_param) / od_param)
covmat: The observed Fisher information matrix of c(log_bias, logit_seq_error, neg_logit_od_param). This can be used as the covariance matrix of these estimates.
prop_mis: The (posterior) proportion of individuals mis-genotyped. Equal to 1 - maxpostprob.