updog_old: Using Parental Data for Offspring Genotyping.

Description Usage Arguments Details Value Author(s)

View source: R/updog.R

Description

This function fits a hierarchical model to sequence counts from a collection of siblings and returns genotyped information. The hierarchy comes from the fact that they share the same parents. If you also have parental sequencing data, then you can include this to improve estimates.

Usage

1
2
3
4
updog_old(ocounts, osize, ploidy, p1counts = NULL, p1size = NULL,
  p2counts = NULL, p2size = NULL, seq_error = NULL, integrate = FALSE,
  do_eb = TRUE, overdispersion = TRUE, update_geno = TRUE,
  update_pi = TRUE, update_outlier = TRUE, update_rho = TRUE)

Arguments

ocounts

A vector of non-negative integers. The ith element is the number of reads of the reference allele in the ith child.

osize

A vector of positive integers. The ith element is the total number of reads for the ith child.

ploidy

A positive integer. The number of copies of the genome in the species. This is the assumed to be the same for all individuals.

p1counts

A vector of non-negative integers. The ith element is the number of reads of the reference allele in the ith sample of parent 1. If NULL then the prior probabilities on parent 1's genotype will default to uniform.

p1size

A vector of positive integers. The ith element is the total number of reads in the ith sample of parent 1. If NULL then the prior probabilities on parent 1's genotype will default to uniform.

p2counts

A vector of non-negative integers. The ith element is the number of reads of the reference allele in the ith sample of parent 2. If NULL then the prior probabilities on parent 2's genotype will default to uniform.

p2size

A vector of positive integers. The ith element is the total number of reads in the ith sample of parent 2. If NULL then the prior probabilities on parent 2's genotype will default to uniform.

seq_error

A non-negative numeric. This is the known sequencing error rate. The default is to estimate this using data that are all approximately the reference allele.

integrate

A logical. Should we integrate over our uncertainty in the parental genotypes (TRUE) or not (FALSE). The default is FALSE because we usually know the parental genotypes with near certainty so it's not important to integrate over our uncertainty in them. This is only implemented if do_eb = FALSE

do_eb

Should we do empirical Bayes (TRUE) or not (FALSE)? You should have a lot of parental data to be able to set this to FALSE.

overdispersion

A logical. Should we fit a beta-binomial model to account for overdispersion (TRUE) or not (FALSE)? If overdispersion = TRUE then we start the overdispersion parameter, rho at 0.001, a very small value. If parental information is provided, then we use that data as the starting values for rho. If update_rho is FALSE, then these values are fixed throughout the estimation procedure.

update_geno

A logical. Update the parental genotypes? If FALSE and if you have parental data, then we fix the parental genotypes to be the maximum a posteriori values. If you do not have parental data, then this should not be set to FALSE.

update_pi

A logical. Update the mixing proporiton? If FALSE, then 1% of the observations are assumed to be outliers.

update_outlier

A logical. Update the outlier distribution? I FALSE, then the outlier distribution is assumed to just be a uniform from 0 to 1.

update_rho

A logical. Update the overdispersion parameter?

Details

If you have a lot of parental sequencing data, then it could suffice to run updog with update_geno set to FALSE, which would save a lot of time. Otherwise, you will probably want to borrow strength between the offspring by setting update_geno to TRUE.

Value

A list with some or all of the following elements:

opostprob: A matrix of proportions whose (i, j)th element is the posterior probability that child j has i - 1 copies of the reference allele. That is, the rows index the genotype and the columns index the offspring.

p1postprob: A vector of proportions whose ith element is the posterior probability that parent 1 has i - 1 copies of the reference allele. These are derived ONLY from parent 1's sequence data and not jointly with all of the data.

p2postprob: A vector of proportions whose ith element is the posterior probability that parent 2 has i - 1 copies of the reference allele. These are derived ONLY from parent 2's sequence data and not jointly with all of the data.

pival: The estimated proportion of observations that are not outliers.

rho: The overdispersion parameter.

out_mu: The outlier distribution mean.

out_rho: The outlier distribution overdispersion parameter.

p1geno: The estimated genopype of parent 1.

p2geno: The estimated genotype of parent 2.

prob_ok: A vector of proportions. The ith element is the posterior probability that the ith element is not an outlier.

ogeno: A vector of integers. The ith element is the estimated genotype of the ith offspring.

alpha: The outlier distributions's shape 1 parameter.

beta: The outlier distributions's shape 2 parameter.

seq_error: The sequencing error rate used during the updog iterates.

Author(s)

David Gerard


dcgerard/updogAlpha documentation built on May 14, 2019, 3:10 a.m.