updog: Using Parental Data for Offspring Genotyping.
In dcgerard/updogAlpha: Using Parental Data for Offspring Genotyping

Description Usage Arguments Details Value Author(s) See Also Examples

This function fits a hierarchical model to sequence counts from a collection of siblings — or a population of individuals in Hardy-Weinberg equilibrium — and returns genotyped information. The hierarchy comes from either the fact that they share the same parents or they come from a population in Hardy-Weinberg equilibrium. If you also have parental sequencing data, then you can include this to improve estimates. This is the simple version of updog with only a limited number of parameters. To see the full list of options, see updog_vanilla.

1	updog(ocounts, osize, ploidy, model = c("f1", "s1", "hw", "uniform"), ...)

`ocounts`	A vector of non-negative integers. The ith element is the number of reads of the reference allele in the ith child.
`osize`	A vector of positive integers. The ith element is the total number of reads for the ith child.
`ploidy`	A positive integer. The number of copies of the genome in the species. This is the assumed to be the same for all individuals.
`model`	The model for the genotype distribution. Do we assume an F1 population (`"f1"`), an S1 population (`"s1"`), Hardy-Weinberg equilibrium (`"hw"`), or a uniform distribution (`"uniform"`).
`...`	Additional arguments to pass to `updog_vanilla`.

The key improvements in updog are its abilities to account for common features in GBS data: sequencing error rate, read-mapping bias, overdispersion, and outlying points.

A list of class updog with some or all of the following elements:

ogeno: A vector. Each element of which is the maximum a posteriori estimate of each individual's genotype.
maxpostprob: A vector. The maximum posterior probability of a genotype for each individual.
postmean: A vector. The posterior mean genotype for each individual.
bias_val: The estimated bias parameter. This is a value greater than 0 which is the ratio of the probability of correctly mapping a read containing the alternative allele to the probability of correctly mapping a read containing the reference allele. A value of 1 indicates no bias. A value less than one indicates bias towards the reference alelle. A value greater than 1 indiciates bias towards the alternative allele.
seq_error: The estimated sequencing error rate. This is between 0 and 1.
od_param: The estimated overdispersion parameter. Also known as the "intra-class correlation", this is the overdispersion parameter in the underlying beta of the beta-binomial distribution of the counts. Between 0 and 1, a value closer to 0 indicates less overdispersion and a value greater than 1 indicates greater overdispersion. In real data, we we typically see estimates between 0 and 0.01.
p1geno: The estimated genotype of one parent. The number of copies of the reference allele one of the parents has.
p1geno: The estimated genotype of the other parent. The number of copies of the reference allele the other parent has.
allele_freq: The estimated allele-frequency of the reference allele. This is the binomial proportion. Between 0 and 1, a value closer to 1 indicates a larger amount of reference alleles in the population.
out_prop: The estimated proportion of points that are outliers.
out_mean: The estimated mean of the outlier distribution. The outlier distribution is beta-binomial.
out_disp: The estimated overdispersion parameter of the outlier distribution. This is the "intra-class correlation" parameter of the beta-binomial outlier distribution.
prob_out: A vector. Each element of which is the posterior probability that a point is an outlier.
prob_ok: The posterior probability that a point is a non-outlier.
p1_prob_out: The posterior probability that parent 1 is an outlier.
p2_prob_out: The posterior probability that parent 2 is an outlier.
num_iter: The number of iterations the optimization program was run.
convergence: 1 if we reached maxiter and 0 otherwise.
llike: The final log-likelihood of the estimates.
hessian: The negative-Fisher information under the parameterization (s, ell, r), where s = log(bias_val) = log(d), ell = logit(seq_error) = logit(eps), and r = - logit(od_param) = - logit(tau). If you want standard errors for these parameters (in the described parameterization), simply take the negative inverse of the hessian.
input: A list with the input counts (ocounts), the input sizes (osize), input parental counts (p1counts and p2counts), input parental sizes (p2size and p1size), the ploidy (ploidy) and the model (model).
log_bias: The log of bias_val
logit_seq_error: The logit of seq_error. I.e. log(seq_error / (1 - seq_error))
neg_logit_od_param: The negative logit of od_param. I.e. log((1 - od_param) / od_param)
covmat: The observed Fisher information matrix of c(log_bias, logit_seq_error, neg_logit_od_param). This can be used as the covariance matrix of these estimates.
prop_mis: The (posterior) proportion of individuals mis-genotyped. Equal to 1 - maxpostprob.

David Gerard

updog_vanilla for more parameter options. plot.updog For plotting the results of updog. summary.updog For some summary capabilities of updog.

## Read in data and format it for SNP1 -------------------------------------
data(snpdat)
ploidy <- 6
snp1 <- snpdat[snpdat$snp == "SNP1", ]
ocounts <- snp1$counts[-1]
osize   <- snp1$size[-1]
pcounts <- snp1$counts[1]
psize   <- snp1$size[1]

## Fit updog ---------------------------------------------------------------
## Here, I don't update the parental genotype, but you should use the
## default setting, `update_pgeno = TRUE`, in practice.
uout <- updog(ocounts = ocounts, osize = osize, p1counts = pcounts,
              p1size = psize, ploidy = 6, model = "s1", p1geno = 5,
              p2geno = 5, update_pgeno = FALSE)
plot(uout)
summary(uout)