semicontinuousWrapper: Semicontinuous LRs
In Ahhgust/MMDIT: Mitochondrial mixture project

View source: R/semicontinuousMixtureInterpretation.R

This is an omnibus wrapper for semicontinuous likelihood estimation. It implements the method of: Ge, Jianye, Bruce Budowle, and Ranajit Chakraborty. "Comments on" Interpreting Y chromosome STR haplotype mixture"." Legal Medicine 13.1 (2011): 52-53. as applied to variant graphs (citation coming)

semicontinuousWrapper(
  genomes,
  genCount,
  rcrs,
  pos0,
  pos1,
  alleles,
  knownHaps = c(),
  nInMix = 2,
  clopperQuantile = 0.95,
  tolerance = 0,
  giveExplainy = FALSE
)

`genomes`	the first data frame from MMDIT::preprocessMitoGenomes
`genCount`	the second data frame from MMDIT::preprocessMitoGenomes
`rcrs`	character string. the mitochondrial genome sequence (whole thing)
`pos0`	0-based coordinate of alleles
`pos1`	1-based coordinate of alleles
`alleles`	the alleles present in the interval specified
`knownHaps`	a vector of haplotypes hypothesized to be in the mixture
`nInMix`	integer; the number of distinct haploid sequences present in the mixture
`clopperQuantile`	the upper-bound confidence interval as per Clopper and Pearson
`tolerance`	should be 0. this permits fuzzy matching between the haplotypes and the mixture. 0 == no fuzz
`giveExplainy`	optionally returns the explaining individuals

The short of it, this creates a variant graph (makeVariantGraph, using pos0, pos1 and alleles) and it takes genomes from the database (genomes, which is stratified by population, genCount is every unique haplotype, regardless of population) and it appends a possibly empty set of known haplotypes (knownHaps) to the set of every unique database-derived haplotype

Then every way of explaining the mixture is computed (at the level of every known haplotype). The procedure is equivalent to (in the case of 2-person mixtures), taking every pair of haplotypes and computing the fraction of haplotypes that explain the mixture. To make things conservative the method of Clopper and Pearson (1934) is used to take the ratio (number that explain / number considered) and map that into a conservative estimate of that ratio.

The likelihood is estimated for every population, and for every subset of knowns possible. e.g., if 1 known haplotype is given, then the likelihood of both the 1 known and 0 knowns is considered. If 2 knowns are hypothesized, then the lr for both knowns, the first known, the second known (individually) and 0 knowns is computed.

The RMNE is also computed; that is, it is the number of haplotypes that explain the mixture (divided by the total, adjusted by Clopper and Pearson).

Ahhgust/MMDIT documentation built on Jan. 27, 2021, 11:48 a.m.