EmissionParam-methods: A parameter class for computing Emission probabilities
In VanillaICE: A Hidden Markov Model for high throughput genotyping arrays

Description Usage Arguments Details Value Details Examples

Parameters for computing emission probabilities for a 6-state HMM, including starting values for the mean and standard deviations for log R ratios (assumed to be Gaussian) and B allele frequencies (truncated Gaussian), and initial state probabilities.

This function is exported primarily for internal use by other BioC packages.

cn_means(object)

cn_sds(object)

baf_means(object)

baf_sds(object)

baf_means(object) <- value

baf_sds(object) <- value

cn_sds(object) <- value

cn_means(object) <- value

EmissionParam(
  cn_means = CN_MEANS(),
  cn_sds = CN_SDS(),
  baf_means = BAF_MEANS(),
  baf_sds = BAF_SDS(),
  initial = rep(1/6, 6),
  EMupdates = 5L,
  CN_range = c(-5, 3),
  temper = 1,
  p_outlier = 1/100,
  modelHomozygousRegions = FALSE
)

EMupdates(object)

## S4 method for signature 'EmissionParam'
show(object)

`object`	see `showMethods("EMupdates")`
`value`	numeric vector
`cn_means`	numeric vector of starting values for log R ratio means (order is by copy number state)
`cn_sds`	numeric vector of starting values for log R ratio standard deviations (order is by copy number state)
`baf_means`	numeric vector of starting values for BAF means ordered. See example for details on how these are ordered.
`baf_sds`	numeric vector of starting values for BAF means ordered. See example for details on how these are ordered.
`initial`	numeric vector of intial state probabilities
`EMupdates`	number of EM updates
`CN_range`	the allowable range of log R ratios. Log R ratios outside this range are thresholded.
`temper`	Emission probabilities can be tempered by emit^temper. This is highly experimental.
`p_outlier`	probability that an observation is an outlier (assumed to be the same for all markers)
`modelHomozygousRegions`	logical. If FALSE (default), the emission probabilities for BAFs are modeled from a mixture of truncated normals and a Unif(0,1) where the mixture probabilities are given by the probability that the SNP is heterozygous. See Details below for a discussion of the implications.

The log R ratios are assumed to be emitted from a normal distribution with a mean and standard deviation that depend on the latent copy number. Similarly, the BAFs are assumed to be emitted from a truncated normal distribution with a mean and standard deviation that depends on the latent number of B alleles relative to the total number of alleles (A+B).

numeric vector

When modelHomozygousRegions is FALSE (the default in versions >= 1.28.0), emission probabilities for B allele frequences are calculated from a mixture of a truncated normal densities and a Unif(0,1) density with the mixture probabilities given by the probability that a SNP is homozygous. In particular, let p denote a 6 dimensional vector of density estimates from a truncated normal distribution for the latent genotypes 'A', 'B', 'AB', 'AAB', 'ABB', 'AAAB', and 'ABBB'. The probability that a genotype is homozygous is estimated as

prHom=(p["A"] + p["B"])/sum(p)

and the probability that the genotype is heterozygous (any latent genotype that is not 'A' or 'B') is given by

prHet = 1-prHom

Since the density of a Unif(0,1) is 1, the 6-dimensional vector of emission probability at a SNP is given by

emit = prHet * p + (1-prHet)

The above has the effect of minimizing the influence of BAFs near 0 and 1 on the state path estimated by the Viterbi algorithm. In particular, the emission probability at homozygous SNPs will be virtually the same for states 3 and 4, but at heterozygous SNPs the emission probability for state 3 will be an order of magnitude greater for state 3 (diploid) compared to state 4 (diploid region of homozygosity). The advantage of this parameterization are fewer false positive hemizygous deletion calls. [ Log R ratios tend to be more sensitive to technical sources of variation than the corresponding BAFs/ genotypes. Regions in which the log R ratios are low due to technical sources of variation will be less likely to be interpreted as evidence of copy number loss if heterozygous genotypes have more 'weight' in the emission estimates than homozgous genotypes. ] The trade-off is that only states estimated by the HMM are those with copy number alterations. In particular, copy-neutral regions of homozygosity will not be called.

By setting modelHomozygousRegions = TRUE, the emission probabilities at a SNP are given simply by the p vector described above and copy-neutral regions of homozygosity will be called.#'

ep <- EmissionParam()
cn_means(ep)
ep <- EmissionParam()
cn_sds(ep)
ep <- EmissionParam()
baf_means(ep)
ep <- EmissionParam()
baf_sds(ep)
ep <- EmissionParam()
baf_means(ep) <- baf_means(ep)
ep <- EmissionParam()
baf_sds(ep) <- baf_sds(ep)
ep <- EmissionParam()
cn_sds(ep) <- cn_sds(ep)
ep <- EmissionParam()
cn_means(ep) <- cn_means(ep)
ep <- EmissionParam()
show(ep)
cn_means(ep)
cn_sds(ep)
baf_means(ep)
baf_sds(ep)