Description Usage Arguments Details Value Author(s) See Also
This function fits a hierarchical model to sequence counts from a collection of siblings — or a population of individuals in Hardy-Weinberg equilibrium — and returns genotyped information. The hierarchy comes from either the fact that they share the same parents or they come from a population in Hardy-Weinberg equilibrium. If you also have parental sequencing data, then you can include this to improve estimates.
1 2 3 4 5 6 7 8 9 10 | updog_vanilla(ocounts, osize, ploidy, p1counts = NULL, p1size = NULL,
p2counts = NULL, p2size = NULL, commit_num = 4, min_disp = 0,
print_val = FALSE, update_outmean = FALSE, update_outdisp = FALSE,
update_outprop = TRUE, update_bias_val = TRUE, update_seq_error = TRUE,
update_od_param = TRUE, update_pgeno = TRUE, p1geno = 3, p2geno = 3,
seq_error = 0.01, od_param = 0.01, bias_val = 1, out_prop = 0.001,
out_mean = 0.5, out_disp = 1/3, non_mono_max = 2, bound_bias = FALSE,
seq_error_mean = -4.7, seq_error_sd = 1, bias_val_mean = 0,
bias_val_sd = 0.7, allele_freq = 0.5, model = c("f1", "s1", "hw",
"uniform"), verbose = FALSE)
|
ocounts |
A vector of non-negative integers. The ith element is the number of reads of the reference allele in the ith child. |
osize |
A vector of positive integers. The ith element is the total number of reads for the ith child. |
ploidy |
A positive integer. The number of copies of the genome in the species. This is the assumed to be the same for all individuals. |
p1counts |
A vector of non-negative integers. The ith element
is the number of reads of the reference allele in the ith
sample of parent 1. If |
p1size |
A vector of positive integers. The ith element is the
total number of reads in the ith sample of parent 1. If
|
p2counts |
A vector of non-negative integers. The ith element
is the number of reads of the reference allele in the ith
sample of parent 2. If |
p2size |
A vector of positive integers. The ith element is the
total number of reads in the ith sample of parent 2. If
|
commit_num |
The number of consecutive iterations where the parental genotypes do not change before we commit to those parental genotypes. |
min_disp |
The minimum value for the over-dispersion parameter of the outlier distribution. If this gets too small, then we can be overly confident in points being outliers. |
print_val |
A logical. Should we print the updates ( |
update_outmean |
A logical. Should we update |
update_outdisp |
A logical. Should we update |
update_outprop |
A logical. Should we update |
update_bias_val |
A logical. Should we update |
update_seq_error |
A logical. Should we update |
update_od_param |
A logical. Should we update |
update_pgeno |
A logical. Should we update |
p1geno |
The initial value of the first parental genotype. |
p2geno |
The initial value of the second parental genotype. |
seq_error |
A non-negative numeric. This is the known sequencing error rate. The default is to estimate this using data that are all approximately the reference allele. |
od_param |
The initial value of the overdispersion parameter. |
bias_val |
The initial value of the bias parameter. |
out_prop |
The initial value of the proportion of points that are outliers. |
out_mean |
The initial value of the mean of the outlier distribution. |
out_disp |
The initial value of the over-dispersion parameter of the outlier distribution. |
non_mono_max |
The maximum number of iterations to allow non-monotonicity of likelihood. |
bound_bias |
A logical. Should we bound |
seq_error_mean |
The mean of the logit-normal prior on the sequencing error rate, which corresponds
to |
seq_error_sd |
The standard deviation of the logit-normal prior on the sequencing error rate, which
corresponds to |
bias_val_mean |
The prior mean on the log of |
bias_val_sd |
The prior standard deviation on the log of |
allele_freq |
The allele frequency if |
model |
The model for the genotype distribution. Do we assume an
F1 population ( |
verbose |
A logical. Should we output a lot more ( |
The key improvements in updog
are its abilities to account for common features in
GBS data: sequencing error rate, read-mapping bias, overdispersion, and outlying points.
A list of class updog
with some or all of the following elements:
ogeno
A vector. Each element of which is the maximum a posteriori estimate of each individual's genotype.
maxpostprob
A vector. The maximum posterior probability of a genotype for each individual.
postmean
A vector. The posterior mean genotype for each individual.
bias_val
The estimated bias parameter. This is a value greater than 0 which is the ratio of the probability of correctly mapping a read containing the alternative allele to the probability of correctly mapping a read containing the reference allele. A value of 1 indicates no bias. A value less than one indicates bias towards the reference alelle. A value greater than 1 indiciates bias towards the alternative allele.
seq_error
The estimated sequencing error rate. This is between 0 and 1.
od_param
The estimated overdispersion parameter. Also known as the "intra-class correlation", this is the overdispersion parameter in the underlying beta of the beta-binomial distribution of the counts. Between 0 and 1, a value closer to 0 indicates less overdispersion and a value greater than 1 indicates greater overdispersion. In real data, we we typically see estimates between 0 and 0.01.
p1geno
The estimated genotype of one parent. The number of copies of the reference allele one of the parents has.
p1geno
The estimated genotype of the other parent. The number of copies of the reference allele the other parent has.
allele_freq
The estimated allele-frequency of the reference allele. This is the binomial proportion. Between 0 and 1, a value closer to 1 indicates a larger amount of reference alleles in the population.
out_prop
The estimated proportion of points that are outliers.
out_mean
The estimated mean of the outlier distribution. The outlier distribution is beta-binomial.
out_disp
The estimated overdispersion parameter of the outlier distribution. This is the "intra-class correlation" parameter of the beta-binomial outlier distribution.
prob_out
A vector. Each element of which is the posterior probability that a point is an outlier.
prob_ok
The posterior probability that a point is a non-outlier.
p1_prob_out
The posterior probability that parent 1 is an outlier.
p2_prob_out
The posterior probability that parent 2 is an outlier.
num_iter
The number of iterations the optimization program was run.
convergence
1
if we reached maxiter
and 0
otherwise.
llike
The final log-likelihood of the estimates.
hessian
The negative-Fisher information under the parameterization (s, ell, r), where s = log(bias_val) = log(d), ell = logit(seq_error) = logit(eps), and r = - logit(od_param) = - logit(tau). If you want standard errors for these parameters (in the described parameterization), simply take the negative inverse of the hessian.
input
A list with the input counts (ocounts
), the input sizes (osize
), input parental counts (p1counts
and p2counts
), input parental sizes (p2size
and p1size
), the ploidy (ploidy
) and the model (model
).
log_bias
The log of bias_val
logit_seq_error
The logit of seq_error
. I.e. log(seq_error / (1 - seq_error))
neg_logit_od_param
The negative logit of od_param
. I.e. log((1 - od_param) / od_param)
covmat
The observed Fisher information matrix of c(log_bias, logit_seq_error, neg_logit_od_param)
. This can be used as the covariance matrix of these estimates.
prop_mis
The (posterior) proportion of individuals mis-genotyped. Equal to 1 - maxpostprob
.
David Gerard
plot.updog
For plotting the output of updog_vanilla
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.