EM_blood: EM algorithm for estimating frequency of A, B, and O blood...

Description Usage Arguments Details Value Examples

Description

Given the phenotypic frequencies for the four blood types (A, B, AB, O), uses the EM algorithm to obtain maximum likelihood estimates for the frequency of the three blood alleles (A, B, O) in a population.

Usage

1
EM_blood(A, B, AB, O, tol = 1e-06, verbose = FALSE)

Arguments

A

number of people with blood type A

B

number of people with blood type B

AB

number of people with blood type AB

O

number of people with blood type O

tol

tolerance level governing when to stop the iterations. Here we use the large absolute error for any parameter and compare it to tol. Iterations stop when the error is less than tol. Defaults to 1e-6.

verbose

Logical; if TRUE then function prints iterative feedback to the console, and if FALSE there is no printing. Defaults to FALSE.

Details

The genotypic frequencies are usually not known for an entire population, because people with blood type A can have a genotype of A/A or A/O, and people with blood type B can have a genotype of B/B or B/O. For blood types AB and O, there is a one-to-one mapping to the genotypes A/B and O/O however. Hence, in the EM setting, the six possible genotypes are viewed as latent variables that follow a multinomial distribution. Using traditional MLE, closed form solutions cannot be obtained.

We assume that the population is in Hardy-Weinberg Equilibrium. That is, for genotype of form X/X, P(X/X) = P(X)^2 and for genotype of form X/Y, P(X/Y) = 2 * P(X) * P(Y).

An initial probability estimate of (1/3, 1/3, 1/3) is reasonable as it corresponds to the situation of uniform allele frequencies. In the function definition, the iterations are updated in the calls to pA_new, pB_new, and pO_new. These equations are derived by going through the Expectation-Maximization algorithm and applying Lagrange multipliers as the last step.

Value

A data.frame with three entries in one row, giving the estimated blood allele frequencies of A, B, and O respectively.

Examples

1
2
3
4
5
6
7
# Population of 100, with equal phenotypic frequencies
A <- 25; B <- 25; AB <- 25; O <- 25
EM_blood(A, B, AB, O)

# Population of 1000, with different phenotypic frequencies
A <- 500; B <- 200; AB <- 50; O <- 250
EM_blood(A, B, AB, O, verbose = TRUE)

dchiu911/naim documentation built on May 15, 2019, 1:48 a.m.