updog: Using Parental Data for Offspring Genotyping.

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/updog.R

Description

This function fits a hierarchical model to sequence counts from a collection of siblings — or a population of individuals in Hardy-Weinberg equilibrium — and returns genotyped information. The hierarchy comes from either the fact that they share the same parents or they come from a population in Hardy-Weinberg equilibrium. If you also have parental sequencing data, then you can include this to improve estimates. This is the simple version of updog with only a limited number of parameters. To see the full list of options, see updog_vanilla.

Usage

1
updog(ocounts, osize, ploidy, model = c("f1", "s1", "hw", "uniform"), ...)

Arguments

ocounts

A vector of non-negative integers. The ith element is the number of reads of the reference allele in the ith child.

osize

A vector of positive integers. The ith element is the total number of reads for the ith child.

ploidy

A positive integer. The number of copies of the genome in the species. This is the assumed to be the same for all individuals.

model

The model for the genotype distribution. Do we assume an F1 population ("f1"), an S1 population ("s1"), Hardy-Weinberg equilibrium ("hw"), or a uniform distribution ("uniform").

...

Additional arguments to pass to updog_vanilla.

Details

The key improvements in updog are its abilities to account for common features in GBS data: sequencing error rate, read-mapping bias, overdispersion, and outlying points.

Value

A list of class updog with some or all of the following elements:

ogeno

A vector. Each element of which is the maximum a posteriori estimate of each individual's genotype.

maxpostprob

A vector. The maximum posterior probability of a genotype for each individual.

postmean

A vector. The posterior mean genotype for each individual.

bias_val

The estimated bias parameter. This is a value greater than 0 which is the ratio of the probability of correctly mapping a read containing the alternative allele to the probability of correctly mapping a read containing the reference allele. A value of 1 indicates no bias. A value less than one indicates bias towards the reference alelle. A value greater than 1 indiciates bias towards the alternative allele.

seq_error

The estimated sequencing error rate. This is between 0 and 1.

od_param

The estimated overdispersion parameter. Also known as the "intra-class correlation", this is the overdispersion parameter in the underlying beta of the beta-binomial distribution of the counts. Between 0 and 1, a value closer to 0 indicates less overdispersion and a value greater than 1 indicates greater overdispersion. In real data, we we typically see estimates between 0 and 0.01.

p1geno

The estimated genotype of one parent. The number of copies of the reference allele one of the parents has.

p1geno

The estimated genotype of the other parent. The number of copies of the reference allele the other parent has.

allele_freq

The estimated allele-frequency of the reference allele. This is the binomial proportion. Between 0 and 1, a value closer to 1 indicates a larger amount of reference alleles in the population.

out_prop

The estimated proportion of points that are outliers.

out_mean

The estimated mean of the outlier distribution. The outlier distribution is beta-binomial.

out_disp

The estimated overdispersion parameter of the outlier distribution. This is the "intra-class correlation" parameter of the beta-binomial outlier distribution.

prob_out

A vector. Each element of which is the posterior probability that a point is an outlier.

prob_ok

The posterior probability that a point is a non-outlier.

p1_prob_out

The posterior probability that parent 1 is an outlier.

p2_prob_out

The posterior probability that parent 2 is an outlier.

num_iter

The number of iterations the optimization program was run.

convergence

1 if we reached maxiter and 0 otherwise.

llike

The final log-likelihood of the estimates.

hessian

The negative-Fisher information under the parameterization (s, ell, r), where s = log(bias_val) = log(d), ell = logit(seq_error) = logit(eps), and r = - logit(od_param) = - logit(tau). If you want standard errors for these parameters (in the described parameterization), simply take the negative inverse of the hessian.

input

A list with the input counts (ocounts), the input sizes (osize), input parental counts (p1counts and p2counts), input parental sizes (p2size and p1size), the ploidy (ploidy) and the model (model).

log_bias

The log of bias_val

logit_seq_error

The logit of seq_error. I.e. log(seq_error / (1 - seq_error))

neg_logit_od_param

The negative logit of od_param. I.e. log((1 - od_param) / od_param)

covmat

The observed Fisher information matrix of c(log_bias, logit_seq_error, neg_logit_od_param). This can be used as the covariance matrix of these estimates.

prop_mis

The (posterior) proportion of individuals mis-genotyped. Equal to 1 - maxpostprob.

Author(s)

David Gerard

See Also

updog_vanilla for more parameter options. plot.updog For plotting the results of updog. summary.updog For some summary capabilities of updog.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Read in data and format it for SNP1 -------------------------------------
data(snpdat)
ploidy <- 6
snp1 <- snpdat[snpdat$snp == "SNP1", ]
ocounts <- snp1$counts[-1]
osize   <- snp1$size[-1]
pcounts <- snp1$counts[1]
psize   <- snp1$size[1]

## Fit updog ---------------------------------------------------------------
## Here, I don't update the parental genotype, but you should use the
## default setting, `update_pgeno = TRUE`, in practice.
uout <- updog(ocounts = ocounts, osize = osize, p1counts = pcounts,
              p1size = psize, ploidy = 6, model = "s1", p1geno = 5,
              p2geno = 5, update_pgeno = FALSE)
plot(uout)
summary(uout)

dcgerard/updogAlpha documentation built on May 14, 2019, 3:10 a.m.