relMatrix: Fast calculation of the Genomic Relationship Matrix and its...

Description Usage Arguments Details Value Benchmarks Examples

Description

relationshipMatrix calculates the relationship matrix A=(M-P)^T (M-P) /σ^2 from the SNP matrix M where P = p (1,…,1) with p = M \%*\% (1,…,1)^T / n. Furthermore, sigma^2 equals σ^2 = p^T (1 - p/2)\in[0,∞).

crossprodx calculates the cross-product of SNPxIndiv, i.e. it is identical to call relationshipMatrix with optional argument, centered=FALSE, cf. RFoptions

allele_freq calculates p/2.

SNPeffect calculates M (A + τ I)^{-1} v

solveRelMat calculates

(A + τ I)^{-1} v

and

A(A + τ I)^{-1} v + β

where A is the relationship matrix.

Usage

1
2
3
4
5
6
relationshipMatrix(SNPxIndiv, ...)
crossprodx(SNPxIndiv) 

solveRelMat(A, tau, vec, betahat=NULL, destroy_A=FALSE)
SNPeffect(SNPxIndiv, vec, centered=TRUE, tau=0)
allele_freq(SNPxIndiv)

Arguments

SNPxIndiv

0,1,2-valued (snps \times indiv) matrix or the result of genomicmatrix.

...

see RFoptions – better use RFoptions. The main two options are:

centered: see below

normalized:logical. if FALSE then the division by sigma^2 is not performed

centered

if FALSE then P is not substracted.

A

a symmetric, positive definite matrix, which is a relationship matrix

tau

non-negative scalar

vec

the vector v

betahat

scalar or NULL. See also section value.

destroy_A

logical. If TRUE the values of the matrix A will be overwritten during the calculations (leading to a faster execution with less memory needs).

Details

Let p = M \%*\% (1,…,1)^T / n where n is the number of individuals. Then, the matrix P equals P = p (1,…,1).

The constant sigma^2 equals σ^2 = p^T (1 - p/2).

solveRelMat has a speed and memory advantage in comparison to the direct implementation of the above formulae.

Value

relationsshipMatrix returns a (Indiv \times Indiv) numerical matrix.

The return value of solveRelMat depends on betahat. If the latter is NULL, only the vector (A + τ I)^{-1} v is returned. Else, a list of 2 elements is returned. First element equals the vector

(A + τ I)^{-1} v,

the second element equals

A(A + τ I)^{-1} v + β.

Benchmarks

Computing times for the relationship matrix in comparison to 'crossprod' in standard implementation on Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, R version 3.6.0 (Linux) with indiv = 1000 and snps = 5e5 are:

Shuffle256 : 48 x faster (AVX2; 16x compressed)
Packed256 : 36 x faster (AVX2; 16x compressed)
Shuffle : 35 x faster (SSSE3; 16x compressed)
Multiply256 : 29 x faster (AVX2; 16x compressed)
Packed : 28 x faster (SSE2; 16x compressed)
Hamming2 : 24 x faster (SSE2; 4x compressed)
Hamming3 : 21 x faster (SSSE3; 4x compressed)
Multiply : 17 x faster (SSE2; 16x compressed)
ThreeBit : 17 x faster (uint64_t; 10x compressed)
TwoBit : 15 x faster (uint64_t; 16x compressed)
NoSNPcoding : 4 x faster (int, AVX2; not compressed)
NoSNPcodingAVX: 2 x faster (double, AVX; not compressed)
NoSNPcodingR : calls crossprod

In parantheses, first the instruction set or s the main data type is given, then the data compression with respect to 32 bit integer.

The following code was used:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
RFoptions(cores = 1)
indiv <- 1000
snps <- 5e5 ## may cause memory allocation problems in R; better use 5e4 !!
methods <- c(NoSNPcodingR, NoSNPcodingAVX,
             FirstGenuineMethod:LastGenuineMethod)
M <- matrix(ncol=indiv, sample(0:2, indiv * snps, replace=TRUE))
for (storageMode in c("integer", "double")){
    storage.mode(M) <- storageMode
  cat("\n\n")
  print(S <- system.time(C <- crossprod(M)))
  p <- rowMeans(M)
  P <- p %*% t(rep(1, indiv))
  sigma2 <- sum(p * (1- p/2))
  A <- crossprod(M-P) / sigma2
  print(S <- system.time(C <- crossprod(M)))  
  for (method in methods) {
    RFoptions(snpcoding = method)
    cat("\nstorage=", storageMode, "  method=", SNPCODING_NAMES[method + 1],
    "\n")
    S0 <- system.time(G <- genomicmatrix(M))
    print(S1 <- system.time(C1 <- crossprodx(M)))
    print(S2 <- system.time(C2 <- crossprodx(G)))
    stopifnot(all(C == C1))
    stopifnot(all(C == C2))
    R1 <- S / S1 
    R2 <- S / S2 
    print(0.5 * (R1 + R2))
    print(S3 <- system.time(C3 <- relationshipMatrix(M)))
    print(S4 <- system.time(C4 <- relationshipMatrix(G)))
    R3 <- S / S3
    R4 <- S / S4 
    print(0.5 * (R3 + R4))
    stopifnot(all.equal(as.double(A), as.double(C3)))
    stopifnot(all.equal(as.double(A), as.double(C4)))
    gc()
  }
}

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
 
require(RandomFieldsUtils)
set.seed(0)
snpcodes <- c(TwoBit, ThreeBit)
if (has.instruction.set("SSE2")) snpcodes <- c(snpcodes, Hamming2)
if (has.instruction.set("SSSE3")) snpcodes <- c(snpcodes, Hamming3, Shuffle)
if (has.instruction.set("AVX2")) snpcodes <- c(snpcodes, Shuffle256)
   
Print(snpcodes)

indiv <- 1 + sample(100:500, 1)
snps <- indiv * 2^sample(1:if (interactive()) 7 else 5, 1)
RFoptions(snpcoding=sample(snpcodes, 1))
M <- matrix(ncol=indiv, sample(0:2, indiv * snps, replace=TRUE))
print(system.time(G <- genomicmatrix(M)))
print(G)  

## crossprodx vs crossprod: about 10x faster
Print(system.time(C <- crossprodx(M)))   
print(system.time(C2 <- crossprod(M)))
stopifnot(all(C == C2))

## allele_freq vs rowMeans: about equally fast
Print(system.time(af <- allele_freq(M)))
print(system.time(alleleFreq <- 0.5 * rowMeans(M)))
stopifnot(all.equal(as.double(alleleFreq), as.double(af)))

## relationshipMatrix vs crossprod: about 10x faster
Print(system.time(R <- relationshipMatrix(M)))
print(system.time(R <- relationshipMatrix(G)))
print(system.time({
  sigma2 <- 2 * sum(alleleFreq * (1 - alleleFreq))
  R2 <- crossprod(M - 2 * alleleFreq) / sigma2
}))
stopifnot(all.equal(as.double(R), as.double(R2)))


### solveRelMat vs. solve: about equally fast
tau <- 0.0001
vec <- runif(indiv)
beta <- runif(1)
Print(system.time(S <- solveRelMat(R, tau=tau, vec=vec, betahat=beta)))
print(system.time({r <- solve(R + diag(indiv) * tau, vec);
                   y <- as.vector(R %*% r + beta)}))
stopifnot(all.equal(S$rest, r))
stopifnot(all.equal(S$yhat, y))

miraculix documentation built on Sept. 22, 2021, 5:07 p.m.