Probability of seeing next alleles (Dirichlet sampling)

1 | ```
pr.next.alleles(ij, seen, fr, theta = 0)
``` |

`ij` |
integer matrix with allele numbers |

`seen` |
integer matrix with alleles already seen |

`fr` |
numeric vector with allelic proportions |

`theta` |
numeric background relatedness |

When a population is subdivided into subpopulations, consecutively sampled alleles are not independent draws. This function implements the Dirichlet formula which states that after sampling *n* alleles, of which *m* are of type *A_i*, the probability that the next allele is of type *A_i* equals:

*(m*θ+(1-θ)*p_i)/(1+(n-1)*θ)*

The alleles already sampled are passed as the rows of the matrix `seen`

, while the corresponding row of `ij`

specifies for which alleles the probability of sampling is computed. The numer of rows of `ij`

has to be equal to the number of rows of `seen`

.

numeric (vector) of probabilities

`pr.next.allele`

, `rmp`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ```
# compute the pr. of seeing 1,1 after 1,1,1,1
# when theta=0 this is simply p_1^2
pr.next.alleles(t(c(1,1)),seen=t(c(1,1,1,1)),fr=c(1/4,3/4),theta=0)
# but when theta>0, the pr. of seeing more 1's increases slightly
pr.next.alleles(t(c(1,1)),seen=t(c(1,1,1,1)),fr=c(1/4,3/4),theta=0.05)
# pr. distribution of (ordered!) genotypes after seeing 1,1,1
ij=matrix(c(1,1,1,2,2,2),ncol=2,byrow=TRUE)
seen=matrix(1,nrow=3,ncol=3,byrow=TRUE)
pr.next.alleles(ij,seen,fr=c(1/4,3/4),theta=0) # theta=0
pr.next.alleles(ij,seen,fr=c(1/4,3/4),theta=0.1) # theta=0.1
p0 <- pr.next.alleles(ij,seen,fr=c(1/4,3/4),theta=0)
stopifnot(all.equal(p0[1]+2*p0[2]+p0[3],1))
p1 <- pr.next.alleles(ij,seen,fr=c(1/4,3/4),theta=0.05)
stopifnot(all.equal(p1[1]+2*p1[2]+p1[3],1))
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.