Kernel functions useful for genetic associations

Share:

Description

These are the functions that might be used in computing pairwise inter-individual similarities based on their single nucleotide polymorphism (SNP) genotypes.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
am(x)
AM(x)
ibs(x)
IBS(x)
lin0(x)
Lin0(x)
quad1(x)
Quad1(x)
Minkowski(x, p = 1)
minkowski(x, p = 1)
polyk(x,c=0,d=1) 
Polyk(x,c=0,d=1) 

Arguments

x

A numeric matrix encoding genotypes. Each row corresponds to an individual and each column corresponds to a genetic marker. Usually, allele-counting coding is used, but others are allowed.

p

The exponent defining the Minkowski distance. The same as in stats::dist.

c

The constant added to cross-products before raising to the power of d.

d

The exponent defining the polynomial kernel. When c=0 and d=1, this is equivalent to lin0. When c=1 and d=2, this is equivalent to quad1.

Details

These functions compute the pairwise similarities among rows of x. Lower-case versions are more useful in the formula interface to specify random genetic effects. Upper-case versions can be used to directly compute the genetic similarity matrix.

am and AM calculate the allele-matching kernel, and AM is based on SPA3G:::KERNEL.

ibs and IBS compute the identity-by-descent (IBS) kernel. IBS is computed as
1 - as.matrix(dist(x, method='manhattan') * .5 /max(1, ncol(x)) ).

lin0 and Lin0 compute the linear kernel with zero intercept. Lin0 is computed as
normalizeTrace(tcrossprod(x)/max(1,ncol(x))).

quad1 and Quad1 compute the quadratic kernel with offset 1. Qaud1 is computed as
normalizeTrace((base::tcrossprod(x)+1)^2).

minkowski and Minkowski compute the similarity based on the Minkowski distance. Minkowski is computed as 1-as.matrix(dist(x, method='minkowski', p=p)) * .5 / max(1, ncol(x))^(1/p) .

Value

The functions starting with an upper-case letter returns an n-by-n symmetric similarity matrix, where n equals nrow(x). The corresponding functions starting with a lower-case letter returns a matrix L such that tcrossprod(L) equals the value from their upper-case counterparts. The number of rows is n, but the number of columns is the rank of the similarity matrix.

Author(s)

Long Qu

See Also

cholRoot, normalizeTrace, stats::dist, SPA3G:::KERNEL, varComp

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
set.seed(2345432L)
x=matrix(sample(2, 50L, replace=TRUE), 10L)  
IBS(x)
range(tcrossprod(ibs(x)) - IBS(x)  )

AM(x)
range(tcrossprod(am(x)) - AM(x)  )

Lin0(x)
range(tcrossprod(lin0(x)) - Lin0(x)  )
range(Lin0(x) - Polyk(x, 0, 1))

Quad1(x)
range(tcrossprod(quad1(x)) - Quad1(x)  )
range(Quad1(x) - Polyk(x, 1, 2))

Minkowski(x)
range(tcrossprod(minkowski(x)) - Minkowski(x)  )
range(tcrossprod(minkowski(x)) - IBS(x)  )

## Use in formulas
model.matrix(~0+ibs(x))
range(tcrossprod(model.matrix(~0+ibs(x))) - IBS(x))

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.