| ldest_hap | R Documentation |
Given genotype (allele dosage) or genotype likelihood data for each individual at a pair of loci, this function will calculate the maximum likelihood estimates and their corresponding asymptotic standard errors of some measures of linkage disequilibrium (LD): D, D', the Pearson correlation, the squared Pearson correlation, and the Fisher-z transformation of the Pearson correlation. This function can be used for both diploids and polyploids.
ldest_hap(
ga,
gb,
K,
reltol = 10^-8,
nboot = 100,
useboot = FALSE,
pen = 2,
grid_init = FALSE,
se = TRUE
)
ga |
One of two possible inputs:
|
gb |
One of two possible inputs:
|
K |
The ploidy of the species. Assumed to be the same for all individuals. |
reltol |
The relative tolerance for the stopping criterion. |
nboot |
Sometimes, the MLE standard errors don't exist. So we use
the bootstrap as a backup. |
useboot |
A logical. Optionally, you may always use the bootstrap
to estimate the standard errors ( |
pen |
The penalty to be applied to the likelihood. You can think about
this as the prior sample size. Should be greater than 1. Does not
apply if |
grid_init |
A logical. Should we initialize the gradient ascent
at a grid of initial values ( |
se |
A logical. Should we calculate standard errors ( |
Let A and a be the reference and alternative alleles, respectively, at
locus 1. Let B and b be the reference and alternative alleles,
respectively, at locus 2. Let paa, pAb, paB, and pAB be the
frequencies of haplotypes ab, Ab, aB, and AB, respectively.
Let pA = pAb + pAB and let pB = paB + pAB
The ldest returns estimates of the following measures
of LD.
D: pAB - pA pB
D': D / Dmax, where Dmax = min(pA pB, (1 - pA) (1 - pB)) if D < 0 and Dmax = min(pA (1 - pB), pA (1 - pB)) if D > 0
r-squared: The squared Pearson correlation, r^2 = D^2 / (pA (1 - pA) pB (1 - pB))
r: The Pearson correlation, r = D / sqrt(pA (1 - pA) pB (1 - pB))
Estimates are obtained via maximum likelihood under the assumption of Hardy-Weinberg equilibrium. The likelihood is calculated by integrating over the possible haplotypes for each pair of genotypes.
The resulting standard errors are based on the square roots of the inverse of the negative Fisher-information. This is from standard maximum likelihood theory. The Fisher-information is known to be biased low, so the actual standard errors are probably a little bigger for small n (n < 20). In some cases the Fisher-information matrix is singular, and so we in these cases we return a bootstrap estimate of the standard error.
The standard error estimate of the squared Pearson correlation is not valid when r^2 = 0.
In cases where either SNP is estimated to be monoallelic
(pA %in% c(0, 1) or pB %in% c(0, 1)), this function
will return LD estimates of NA.
A vector with some or all of the following elements:
DThe estimate of the LD coefficient.
D_seThe standard error of the estimate of the LD coefficient.
r2The estimate of the squared Pearson correlation.
r2_seThe standard error of the estimate of the squared Pearson correlation.
rThe estimate of the Pearson correlation.
r_seThe standard error of the estimate of the Pearson correlation.
DprimeThe estimate of the standardized LD
coefficient. When type = "comp", this corresponds
to the standardization where we fix allele frequencies.
Dprime_seThe standard error of Dprime.
DprimegThe estimate of the standardized LD coefficient. This corresponds to the standardization where we fix genotype frequencies.
Dprimeg_seThe standard error of Dprimeg.
zThe Fisher-z transformation of r.
z_seThe standard error of the Fisher-z
transformation of r.
p_abThe estimated haplotype frequency of ab. Only returned if estimating the haplotypic LD.
p_AbThe estimated haplotype frequency of Ab. Only returned if estimating the haplotypic LD.
p_aBThe estimated haplotype frequency of aB. Only returned if estimating the haplotypic LD.
p_ABThe estimated haplotype frequency of AB. Only returned if estimating the haplotypic LD.
q_ijThe estimated frequency of genotype i at locus 1 and genotype j at locus 2. Only returned if estimating the composite LD.
nThe number of individuals used to estimate pairwise LD.
David Gerard
set.seed(1)
n <- 100 # sample size
K <- 6 # ploidy
## generate some fake genotypes when LD = 0.
ga <- stats::rbinom(n = n, size = K, prob = 0.5)
gb <- stats::rbinom(n = n, size = K, prob = 0.5)
head(ga)
head(gb)
## generate some fake genotype likelihoods when LD = 0.
gamat <- t(sapply(ga, stats::dnorm, x = 0:K, sd = 1, log = TRUE))
gbmat <- t(sapply(gb, stats::dnorm, x = 0:K, sd = 1, log = TRUE))
head(gamat)
head(gbmat)
## Haplotypic LD with genotypes
ldout1 <- ldest_hap(ga = ga,
gb = gb,
K = K)
head(ldout1)
## Haplotypic LD with genotype likelihoods
ldout2 <- ldest_hap(ga = gamat,
gb = gbmat,
K = K)
head(ldout2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.