ibs: Computes (average) Idenity-by-State for a set of people and...

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Given a set of SNPs, computes a matrix of average IBS for a group of individuals. This function facilitates quality control of genomic data. E.g. people with exteremly high (close to 1) IBS may indicate duplicated samples (or twins), simply high values of IBS may indicate relatives.

Usage

1
2
  ibs(data, snpsubset, idsubset = NULL,
    cross.idsubset = NULL, weight = "no", snpfreq = NULL)

Arguments

data

object of snp.data-class

snpsubset

Index, character or logical vector with subset of SNPs to run analysis on. If missing, all SNPs from data are used for analysis.

idsubset

IDs of people to be analysed. If missing, all people from data are used for analysis.

cross.idsubset

Parameter allowing parallel implementation. Not to be used normally. If supplied together with idsubset, the ibs/kinship for all pairs between idsubset and cross.idsubset computed.

weight

"no" for direct IBS computations, "freq" to weight by allelic frequency asuming HWE and "eVar" for empirical variance to be used

snpfreq

when option weight="freq" used, you can provide fixed allele frequencies

Details

When weight "freq" is used, IBS for a pair of people i and j is computed as

f_{i,j} = \frac{1}{N} Σ_k \frac{(x_{i,k} - p_k) * (x_{j,k} - p_k)}{(p_k * (1 - p_k))}

where k changes from 1 to N = number of SNPs GW, x_{i,k} is a genotype of ith person at the kth SNP, coded as 0, 1/2, 1 and p_k is the frequency of the "+" allele. This apparently provides an unbiased estimate of the kinship coefficient.

With "eVar" option above formula changes by using ( 2 * empirical variance of the genotype ) in the denominator. The empirical variance is computed according to the formula

Var(g_k) = \frac{1}{M} Σ_i g_{ik}^2 - E[g_k]^2

where M is the number of people

Only with "freq" option monomorphic SNPs are regarded as non-informative.

ibs() operation may be very lengthy for a large number of people.

Value

A (Npeople X Npeople) matrix giving average IBS (kinship) values between a pair below the diagonal and number of SNP genotype measured for both members of the pair above the diagonal.

On the diagonal, homozygosity 0.5*(1+inbreeding) is provided with option 'freq'; with option 'eVar' the diagonal is set to 0.5; the diagonal is set to homozygosity with option 'no'.

attr(computedobject,"Var") returns variance (replacing the diagonal when the object is used by egscore

Author(s)

Yurii Aulchenko

See Also

check.marker, summary.snp.data, snp.data-class

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
require(GenABEL.data)
data(ge03d2c)
set.seed(7)
# compute IBS based on a random sample of 1000 autosomal marker
selectedSnps <- sample(autosomal(ge03d2c),1000,replace=FALSE)
a <- ibs(ge03d2c,snps=selectedSnps)
a[1:5,1:5]
mds <- cmdscale(as.dist(1-a))
plot(mds)
# identify smaller cluster of outliers
km <- kmeans(mds,centers=2,nstart=1000)
cl1 <- names(which(km$cluster==1))
cl2 <- names(which(km$cluster==2))
if (length(cl1) > length(cl2)) cl1 <- cl2;
cl1
# PAINT THE OUTLIERS IN RED
points(mds[cl1,],pch=19,col="red")
# compute genomic kinship matrix to be used with e.g. polygenic, mmscore, etc
a <- ibs(ge03d2c,snps=selectedSnps,weight="freq")
a[1:5,1:5]
# now replace diagonal with EIGENSTRAT-type of diaganal to be used for egscore
diag(a) <- hom(ge03d2c[,autosomal(ge03d2c)])$Var
a[1:5,1:5]
##############################
# compare 'freq' with 'eVar'
##############################
ibsFreq <- ibs(ge03d2c,snps=selectedSnps, weight="freq")
ibsEvar <- ibs(ge03d2c,snps=selectedSnps, weight="eVar")
mdsEvar <- cmdscale( as.dist( 0.5 - ibsEvar ) )
plot(mdsEvar)
outliers <- (mdsEvar[,1]>0.1)
ibsFreq[upper.tri(ibsFreq,diag=TRUE)] <- NA
ibsEvar[upper.tri(ibsEvar,diag=TRUE)] <- NA
plot(ibsEvar,ibsFreq)
points(ibsEvar[outliers,outliers],ibsFreq[outliers,outliers],col="red")

Example output

Loading required package: MASS
Loading required package: GenABEL.data
          id199       id287       id300       id403     id415
id199 0.7530488 976.0000000 976.0000000 971.0000000 972.00000
id287 0.7233607   0.5227964 979.0000000 973.0000000 974.00000
id300 0.8037910   0.7431052   0.7537994 974.0000000 974.00000
id403 0.7919670   0.7420349   0.8054415   0.7749491 969.00000
id415 0.8014403   0.7422998   0.8013347   0.7915377   0.74059
[1] "id2097" "id6954" "id2136" "id858" 
             id199        id287         id300        id403       id415
id199  0.512195437 975.00000000 975.000000000 970.00000000 971.0000000
id287 -0.066869588   0.05626759 978.000000000 972.00000000 973.0000000
id300  0.018013814  -0.01702514   0.513933879 973.00000000 973.0000000
id403 -0.007902524  -0.05464352  -0.010335318   0.55642267 968.0000000
id415  0.020729572  -0.00485724  -0.001735076  -0.01592227   0.4865665
             id199        id287         id300        id403      id415
id199  0.472184349 975.00000000 975.000000000 970.00000000 971.000000
id287 -0.066869588   1.43801996 978.000000000 972.00000000 973.000000
id300  0.018013814  -0.01702514   0.445693258 973.00000000 973.000000
id403 -0.007902524  -0.05464352  -0.010335318   0.43343768 968.000000
id415  0.020729572  -0.00485724  -0.001735076  -0.01592227   0.447595

GenABEL documentation built on May 30, 2017, 3:36 a.m.

Related to ibs in GenABEL...