snpgdsPairIBD: Calculate Identity-By-Descent (IBD) Coefficients

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/IBD.R

Description

Calculate the three IBD coefficients (k0, k1, k2) for non-inbred individual pairs by Maximum Likelihood Estimation (MLE) or PLINK Method of Moment (MoM).

Usage

1
2
3
4
snpgdsPairIBD(geno1, geno2, allele.freq,
    method=c("EM", "downhill.simplex", "MoM", "Jacquard"),
    kinship.constraint=FALSE, max.niter=1000L, reltol=sqrt(.Machine$double.eps),
    coeff.correct=TRUE, out.num.iter=TRUE, verbose=TRUE)

Arguments

geno1

the SNP genotypes for the first individual, 0 – BB, 1 – AB, 2 – AA, other values – missing

geno2

the SNP genotypes for the second individual, 0 – BB, 1 – AB, 2 – AA, other values – missing

allele.freq

the allele frequencies

method

"EM", "downhill.simplex", "MoM" or "Jacquard", see details

kinship.constraint

if TRUE, constrict IBD coefficients ($k_0,k_1,k_2$) in the genealogical region ($2 k_0 k_1 >= k_2^2$)

max.niter

the maximum number of iterations

reltol

relative convergence tolerance; the algorithm stops if it is unable to reduce the value of log likelihood by a factor of $reltol * (abs(log likelihood with the initial parameters) + reltol)$ at a step.

coeff.correct

TRUE by default, see details

out.num.iter

if TRUE, output the numbers of iterations

verbose

if TRUE, show information

Details

If method = "MoM", then PLINK Method of Moment without a allele-count-based correction factor is conducted. Otherwise, two numeric approaches for maximum likelihood estimation can be used: one is Expectation-Maximization (EM) algorithm, and the other is Nelder-Mead method or downhill simplex method. Generally, EM algorithm is more robust than downhill simplex method. "Jacquard" refers to the estimation of nine Jacquard's coefficients.

If coeff.correct is TRUE, the final point that is found by searching algorithm (EM or downhill simplex) is used to compare the six points (fullsib, offspring, halfsib, cousin, unrelated), since any numeric approach might not reach the maximum position after a finit number of steps. If any of these six points has a higher value of log likelihood, the final point will be replaced by the best one.

Value

Return a data.frame:

k0

IBD coefficient, the probability of sharing ZERO IBD

k1

IBD coefficient, the probability of sharing ONE IBD

loglik

the value of log likelihood

niter

the number of iterations

Author(s)

Xiuwen Zheng

References

Milligan BG. 2003. Maximum-likelihood estimation of relatedness. Genetics 163:1153-1167.

Weir BS, Anderson AD, Hepler AB. 2006. Genetic relatedness analysis: modern data and new challenges. Nat Rev Genet. 7(10):771-80.

Choi Y, Wijsman EM, Weir BS. 2009. Case-control association testing in the presence of unknown relationships. Genet Epidemiol 33(8):668-78.

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ & Sham PC. 2007. PLINK: a toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics, 81.

See Also

snpgdsPairIBDMLELogLik, snpgdsIBDMLE, snpgdsIBDMLELogLik, snpgdsIBDMoM

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())

YRI.id <- read.gdsn(index.gdsn(genofile, "sample.id"))[
    read.gdsn(index.gdsn(genofile, "sample.annot/pop.group"))=="YRI"]

# SNP pruning
set.seed(10)
snpset <- snpgdsLDpruning(genofile, sample.id=YRI.id, maf=0.05,
    missing.rate=0.05)
snpset <- unname(sample(unlist(snpset), 250))

# the number of samples
n <- 25

# specify allele frequencies
RF <- snpgdsSNPRateFreq(genofile, sample.id=YRI.id, snp.id=snpset,
    with.id=TRUE)
summary(RF$AlleleFreq)

subMLE <- snpgdsIBDMLE(genofile, sample.id=YRI.id[1:n], snp.id=RF$snp.id,
    allele.freq=RF$AlleleFreq)
subMoM <- snpgdsIBDMoM(genofile, sample.id=YRI.id[1:n], snp.id=RF$snp.id,
    allele.freq=RF$AlleleFreq)
subJac <- snpgdsIBDMLE(genofile, sample.id=YRI.id[1:n], snp.id=RF$snp.id,
    allele.freq=RF$AlleleFreq, method="Jacquard")



########################

# genotype matrix
mat <- snpgdsGetGeno(genofile, sample.id=YRI.id[1:n], snp.id=snpset,
    snpfirstdim=TRUE)

rv <- NULL
for (i in 2:n)
{
    rv <- rbind(rv, snpgdsPairIBD(mat[,1], mat[,i], RF$AlleleFreq, "EM"))
    print(snpgdsPairIBDMLELogLik(mat[,1], mat[,i], RF$AlleleFreq,
        relatedness="unrelated", verbose=TRUE))
}
rv
summary(rv$k0 - subMLE$k0[1, 2:n])
summary(rv$k1 - subMLE$k1[1, 2:n])
# ZERO

rv <- NULL
for (i in 2:n)
    rv <- rbind(rv, snpgdsPairIBD(mat[,1], mat[,i], RF$AlleleFreq, "MoM"))
rv
summary(rv$k0 - subMoM$k0[1, 2:n])
summary(rv$k1 - subMoM$k1[1, 2:n])
# ZERO

rv <- NULL
for (i in 2:n)
    rv <- rbind(rv, snpgdsPairIBD(mat[,1], mat[,i], RF$AlleleFreq, "Jacquard"))
rv
summary(rv$D1 - subJac$D1[1, 2:n])
summary(rv$D2 - subJac$D2[1, 2:n])
# ZERO

# close the genotype file
snpgdsClose(genofile)

Example output

Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
SNP pruning based on LD:
Excluding 365 SNPs on non-autosomes
Excluding 1,646 SNPs (monomorphic: TRUE, MAF: 0.05, missing rate: 0.05)
    # of samples: 93
    # of SNPs: 7,077
    using 1 thread
    sliding window: 500,000 basepairs, Inf SNPs
    |LD| threshold: 0.2
    method: composite
Chromosome 1: 62.29%, 446/716
Chromosome 2: 62.67%, 465/742
Chromosome 3: 59.93%, 365/609
Chromosome 4: 64.23%, 361/562
Chromosome 5: 62.37%, 353/566
Chromosome 6: 59.82%, 338/565
Chromosome 7: 63.14%, 298/472
Chromosome 8: 57.58%, 281/488
Chromosome 9: 62.98%, 262/416
Chromosome 10: 60.46%, 292/483
Chromosome 11: 63.09%, 282/447
Chromosome 12: 62.76%, 268/427
Chromosome 13: 63.08%, 217/344
Chromosome 14: 63.83%, 180/282
Chromosome 15: 63.74%, 167/262
Chromosome 16: 62.23%, 173/278
Chromosome 17: 65.70%, 136/207
Chromosome 18: 59.40%, 158/266
Chromosome 19: 68.33%, 82/120
Chromosome 20: 66.38%, 152/229
Chromosome 21: 61.11%, 77/126
Chromosome 22: 57.76%, 67/116
5,420 markers are selected in total.
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.05376 0.23790 0.46774 0.48641 0.74597 0.94624 
Identity-By-Descent analysis (MLE) on genotypes:
Excluding 8,838 SNPs (non-autosomes or non-selection)
Excluding 0 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
    # of samples: 25
    # of SNPs: 250
    using 1 thread
Specifying allele frequencies, mean: 0.486, sd: 0.284
MLE IBD:    the sum of all selected genotypes (0,1,2) = 6112
MLE IBD:	Wed Apr 14 08:21:08 2021	0%
MLE IBD:	Wed Apr 14 08:21:09 2021	100%
IBD analysis (PLINK method of moment) on genotypes:
Excluding 8,838 SNPs (non-autosomes or non-selection)
Excluding 0 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
    # of samples: 25
    # of SNPs: 250
    using 1 thread
Specifying allele frequencies, mean: 0.486, sd: 0.284
*** A correction factor based on allele count is not used, since the allele frequencies are specified.
PLINK IBD:    the sum of all selected genotypes (0,1,2) = 6112
Wed Apr 14 08:21:09 2021    (internal increment: 65536)

[..................................................]  0%, ETC: ---        
[==================================================] 100%, completed, 0s
Wed Apr 14 08:21:09 2021    Done.
Identity-By-Descent analysis (MLE) on genotypes:
Excluding 8,838 SNPs (non-autosomes or non-selection)
Excluding 0 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)
    # of samples: 25
    # of SNPs: 250
    using 1 thread
Specifying allele frequencies, mean: 0.486, sd: 0.284
MLE IBD:    the sum of all selected genotypes (0,1,2) = 6112
MLE IBD:	Wed Apr 14 08:21:09 2021	0%
MLE IBD:	Wed Apr 14 08:21:09 2021	100%
Genotype matrix: 250 SNPs X 25 samples
[1] -370.7482
[1] -402.2141
[1] -383.7897
[1] -377.9084
[1] -381.3139
[1] -397.5581
[1] -378.3344
[1] -370.703
[1] -376.103
[1] -377.7911
[1] -375.5425
[1] -373.13
[1] -383.6992
[1] -393.5194
[1] -371.9843
[1] -369.6468
[1] -374.5139
[1] -377.841
[1] -387.5622
[1] -377.1646
[1] -377.4659
[1] -375.2204
[1] -372.0639
[1] -379.816
          k0           k1    loglik niter
1  1.0000000 0.000000e+00 -370.7482   340
2  0.9278497 4.490042e-02 -401.8323   674
3  0.9320171 2.171748e-04 -382.7851   637
4  0.9757904 1.433990e-04 -377.8073   587
5  1.0000000 0.000000e+00 -381.3139   550
6  1.0000000 0.000000e+00 -397.5581   272
7  1.0000000 0.000000e+00 -378.3344   344
8  0.9430855 3.402422e-05 -369.9385   238
9  1.0000000 0.000000e+00 -376.1030   611
10 0.9179167 5.186286e-02 -377.3652   714
11 0.9630939 2.182349e-03 -375.3065  1000
12 0.0000000 1.000000e+00 -348.6070   147
13 1.0000000 0.000000e+00 -383.6992   743
14 0.9061298 6.031602e-02 -393.0712   437
15 0.8641232 1.358768e-01 -371.6073   210
16 0.7981697 2.018303e-01 -368.8638   177
17 1.0000000 0.000000e+00 -374.5139   164
18 1.0000000 0.000000e+00 -377.8410   118
19 0.9500700 4.992932e-02 -387.5143   313
20 0.9949179 2.725808e-04 -377.1602   629
21 0.9710880 3.137421e-05 -377.3199   343
22 0.9097788 2.643432e-04 -373.9321   305
23 0.7913741 2.086257e-01 -371.1589   157
24 1.0000000 0.000000e+00 -379.8160   156
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0       0       0       0 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0       0       0       0 
          k0         k1 loglik niter
1  0.7532082 0.24679185    NaN     0
2  0.7855774 0.21442258    NaN     0
3  0.7103556 0.28964437    NaN     0
4  0.6865533 0.31344674    NaN     0
5  0.5769952 0.42300478    NaN     0
6  1.0000000 0.00000000    NaN     0
7  0.6434333 0.35656665    NaN     0
8  0.9096684 0.01573758    NaN     0
9  0.6943081 0.30569185    NaN     0
10 0.5581343 0.44186567    NaN     0
11 0.5404675 0.45953254    NaN     0
12 0.0000000 1.00000000    NaN     0
13 0.5435959 0.45640406    NaN     0
14 0.7883793 0.19911987    NaN     0
15 0.3378116 0.66218843    NaN     0
16 0.3005318 0.69946817    NaN     0
17 0.9637291 0.03627094    NaN     0
18 1.0000000 0.00000000    NaN     0
19 0.6715518 0.32844817    NaN     0
20 0.8490238 0.13702670    NaN     0
21 0.7706416 0.22935836    NaN     0
22 0.8687864 0.00000000    NaN     0
23 0.4312604 0.56873956    NaN     0
24 0.7080179 0.29198213    NaN     0
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0       0       0       0 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0       0       0       0 
              D1            D2           D3           D4           D5
1   5.624046e-37  1.180280e-20 4.904718e-21 1.677608e-11 5.609379e-11
2   1.599776e-55  1.503765e-27 7.079950e-35 6.262253e-19 1.625399e-18
3   1.072335e-40  4.267261e-24 5.654489e-26 3.201097e-16 2.342972e-13
4   1.329931e-31  6.653634e-22 4.747074e-20 9.030593e-14 5.531960e-15
5   5.885350e-50  2.609011e-32 2.161310e-25 6.454104e-19 1.012404e-02
6   1.538934e-61  3.872675e-18 2.647038e-31 3.315840e-16 2.139932e-24
7   1.343589e-34  3.488172e-23 1.680912e-18 6.893508e-12 5.160654e-18
8   2.349188e-43  5.205905e-20 3.894035e-30 1.812634e-18 9.555617e-19
9   5.986137e-31  2.918346e-21 7.377681e-18 4.993721e-13 1.567748e-15
10  3.538317e-34  3.287673e-27 2.209391e-19 1.586568e-17 1.173310e-12
11  3.581253e-37  2.001565e-34 6.198208e-23 1.017490e-17 9.002649e-23
12  2.197678e-92 3.064865e-136 7.025180e-47 2.326243e-62 1.855975e-05
13 3.276195e-112  8.125742e-63 3.412096e-46 5.285428e-28 1.114704e-04
14  4.852213e-79  3.798155e-37 5.596519e-37 5.027177e-29 7.222682e-17
15  2.827713e-66  4.657999e-53 2.856851e-25 4.390683e-21 8.227261e-06
16  2.574573e-53  1.343953e-47 6.051528e-22 9.472207e-23 3.519872e-17
17  5.840299e-68  5.261274e-26 9.255782e-40 2.157719e-21 6.128667e-25
18  4.758757e-49  4.322460e-21 1.081958e-40 5.913304e-20 1.755982e-19
19  4.309697e-45  1.176392e-28 1.438804e-17 1.130813e-16 1.620374e-21
20  8.802271e-43  2.137563e-20 2.466227e-21 2.128369e-14 5.266146e-16
21  2.912346e-35  2.903154e-19 3.196077e-22 1.909580e-12 4.410476e-18
22  4.784547e-62  7.600208e-27 9.232315e-35 1.717683e-21 1.201433e-25
23  3.815788e-61  2.498728e-47 5.921616e-33 3.367568e-23 1.155295e-11
24  8.430738e-35  2.120598e-18 2.968306e-17 7.510117e-09 2.776526e-12
             D6           D7           D8    loglik niter
1  1.159093e-08 2.164057e-11 2.917311e-06 -331.5457    90
2  1.359681e-07 2.030707e-10 1.310150e-05 -354.4143   151
3  8.297192e-08 2.262973e-10 5.180403e-06 -331.0238   115
4  2.604857e-09 2.548209e-09 5.092313e-06 -336.9514   111
5  1.703514e-05 1.583926e-22 4.689546e-08 -340.3995   165
6  1.058149e-05 4.430545e-14 6.757829e-09 -355.8108   135
7  2.058527e-11 5.171751e-10 4.469748e-06 -349.2330   105
8  9.060776e-06 1.604413e-09 5.581030e-07 -328.1605   145
9  1.057860e-08 3.945948e-10 4.859482e-06 -336.6887   108
10 7.127443e-07 1.262448e-12 6.518920e-06 -328.9219   133
11 2.652194e-16 4.287636e-08 1.299948e-05 -335.7984   153
12 1.096117e-13 9.705827e-32 8.786320e-01 -300.4332   370
13 4.482075e-02 6.237102e-49 1.878718e-09 -350.1596   296
14 7.369247e-05 2.501202e-18 3.839098e-08 -342.5152   234
15 4.648563e-08 7.165893e-28 9.071040e-06 -329.6142   178
16 3.911998e-10 3.056952e-19 2.581633e-05 -325.9074   195
17 2.377998e-05 4.652730e-27 2.109743e-13 -330.5680   179
18 2.192231e-05 4.916291e-23 3.503103e-13 -338.2142   175
19 7.350363e-09 5.310814e-12 1.194814e-05 -349.8404   146
20 2.879301e-06 5.136261e-12 1.328897e-06 -331.9516   109
21 4.795682e-10 5.542997e-08 3.508234e-06 -338.1645   100
22 4.913266e-10 9.012403e-06 1.046127e-05 -332.1700   190
23 4.893518e-11 1.624286e-17 2.286308e-05 -322.7322   187
24 1.638351e-07 2.251153e-14 1.143161e-06 -350.8069    71
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0       0       0       0 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0       0       0       0 

SNPRelate documentation built on Nov. 8, 2020, 5:31 p.m.