hom: function to compute average homozygosity within a person

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This function computes average homozygosity (inbreeding) for a set of people, across multiple markers. Can be used for Quality Control (e.g. contamination checks)

Usage

1
  hom(data, snpsubset, idsubset, snpfreq, n.snpfreq = 1000)

Arguments

data

Object of gwaa.data-class or snp.data-class

snpsubset

Subset of SNPs to be used

idsubset

People for whom average homozygosity is to be computed

snpfreq

when option weight="freq" used, you can provide fixed allele frequencies

n.snpfreq

when option weight="freq" used, you can provide a vector supplying the number of people used to estimate allele frequencies at the particular marker, or a fixed number

Details

Homozygosity is measured as proportion of homozygous genotypes observed in a person.

Inbreeding for person i is estimated with

f_i = ((O_i - E_i))/((L_i - E_i))

f_i = ((O_i - E_i))/((L_i - E_i))

f_i = ((O_i - E_i))/((L_i - E_i))

where O_i is observed homozygosity, L_i is the number of SNPs measured in individual i and

E_i = Sigma_(j=1)^(L_i) (1 - 2 p_j (1 - p_j) (T_(Aj))/(T_(Aj)-1))

E_i = Sigma_(j=1)^(L_i) (1 - 2 p_j (1 - p_j) (T_(Aj))/(T_(Aj)-1))

E_i = Sigma_(j=1)^(L_i) (1 - 2 p_j (1 - p_j) (T_(Aj))/(T_(Aj)-1))

where T_{Aj} is the number of measured genotypes at locus j; T_{Aj} is either estimated from data or provided by "n.snpfreq" parameter (vector). Allelic frequencies are either estimated from data or provided by the "snpfreq" vector.

This measure is the same as used by PLINK (see reference).

The variance (Var) is estimated as

V_{i} = \frac{1}{N} Σ_k \frac{(x_{i,k} - p_k)^2}{(p_k * (1 - p_k))}

where k changes from 1 to N = number of SNPs, x_{i,k} is a genotype of ith person at the kth SNP, coded as 0, 1/2, 1 and p_k is the frequency of the "+" allele.

Only polymorphic loci with number of measured genotypes >1 are used with this option.

This variance is used as diagonal of the genomic kinship matrix when using EIGENSTRAT method.

You should use as many people and markers as possible when estimating inbreeding/variance from marker data.

Value

A matrix with rows corresponding to the ID names and columns showing the number of SNPs measured in this person (NoMeasured), the number of measured polymorphic SNPs (NoPoly), homozygosity (Hom), expected homozygosity (E(Hom)), variance, and the estimate of inbreeding, F.

Author(s)

Yurii Aulchenko, partly based on code by John Barnard

References

Purcell S. et al, (2007) PLINK: a toolset for whole genome association and population-based linkage analyses. Am. J. Hum. Genet.

See Also

ibs, gwaa.data-class, snp.data-class

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
require(GenABEL.data)
data(ge03d2)
h <- hom(ge03d2[,c(1:100)])
h[1:5,]
homsem <- h[,"Hom"]*(1-h[,"Hom"])/h[,"NoMeasured"]
plot(h[,"Hom"],homsem)
# wrong analysis: one should use all people (for right frequency)
# and markers (for right F) available!
h <- hom(ge03d2[,c(1:10)])
h

GenABEL documentation built on May 30, 2017, 3:36 a.m.