Description Usage Arguments Details Value Benchmarks Examples
relationshipMatrix calculates the relationship matrix
A=(M-P)^T (M-P) /σ^2 from the SNP matrix M
where P = p (1,…,1) with  p = M \%*\% (1,…,1)^T / n.
Furthermore, sigma^2 equals
σ^2 = p^T  (1 - p/2)\in[0,∞).
crossprodx calculates the cross-product of SNPxIndiv,
i.e. it is identical to call relationshipMatrix with
optional argument,
centered=FALSE, cf. RFoptions
allele_freq calculates p/2.
SNPeffect calculates M (A + τ I)^{-1} v
solveRelMat calculates
(A + τ I)^{-1} v
and
A(A + τ I)^{-1} v + β
where A is the relationship matrix.
| 1 2 3 4 5 6 | relationshipMatrix(SNPxIndiv, ...)
crossprodx(SNPxIndiv) 
solveRelMat(A, tau, vec, betahat=NULL, destroy_A=FALSE)
SNPeffect(SNPxIndiv, vec, centered=TRUE, tau=0)
allele_freq(SNPxIndiv)
 | 
| SNPxIndiv | 0,1,2-valued (snps \times indiv)
matrix or the result of  | 
| ... | see  
 
 | 
| centered | if  | 
| A | a symmetric, positive definite matrix, which is a relationship matrix | 
| tau | non-negative scalar | 
| vec | the vector v | 
| betahat | scalar or  | 
| destroy_A | logical. If  | 
Let p = M \%*\% (1,…,1)^T / n where n is the number of individuals. Then, the matrix P equals P = p (1,…,1).
The constant sigma^2 equals σ^2 = p^T (1 - p/2).
solveRelMat has a speed and memory advantage in comparison to the
direct implementation of the above formulae.
relationsshipMatrix returns a (Indiv \times Indiv)
numerical matrix.
The return value of solveRelMat depends on betahat.
If the latter is NULL, only the vector (A + τ I)^{-1} v
is returned. Else,  a list of 2 elements is returned. First element equals
the vector 
(A + τ I)^{-1} v,
the second element equals
A(A + τ I)^{-1} v + β.
Computing times for the relationship matrix in comparison to 'crossprod'
in standard implementation on Intel(R) Core(TM) i7-8550U CPU @
1.80GHz, R version 3.6.0 (Linux) with indiv = 1000 and
snps = 5e5 are:
Shuffle256    : 48 x faster (AVX2; 16x compressed)
Packed256     : 36 x faster (AVX2; 16x compressed)
Shuffle       : 35 x faster (SSSE3; 16x compressed)
Multiply256   : 29 x faster (AVX2; 16x compressed)
Packed        : 28 x faster (SSE2; 16x compressed)
Hamming2      : 24 x faster (SSE2; 4x compressed)
Hamming3      : 21 x faster (SSSE3; 4x compressed)
Multiply      : 17 x faster (SSE2; 16x compressed)
ThreeBit      : 17 x faster (uint64_t; 10x compressed)
TwoBit        : 15 x faster (uint64_t; 16x compressed)
NoSNPcoding   :  4 x faster (int, AVX2; not compressed)
NoSNPcodingAVX:  2 x faster (double, AVX; not compressed)
NoSNPcodingR  : calls crossprod 
In parantheses, first the instruction set or s the main data type is given, then the data compression with respect to 32 bit integer.
The following code was used:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | RFoptions(cores = 1)
indiv <- 1000
snps <- 5e5 ## may cause memory allocation problems in R; better use 5e4 !!
methods <- c(NoSNPcodingR, NoSNPcodingAVX,
             FirstGenuineMethod:LastGenuineMethod)
M <- matrix(ncol=indiv, sample(0:2, indiv * snps, replace=TRUE))
for (storageMode in c("integer", "double")){
    storage.mode(M) <- storageMode
  cat("\n\n")
  print(S <- system.time(C <- crossprod(M)))
  p <- rowMeans(M)
  P <- p %*% t(rep(1, indiv))
  sigma2 <- sum(p * (1- p/2))
  A <- crossprod(M-P) / sigma2
  print(S <- system.time(C <- crossprod(M)))  
  for (method in methods) {
    RFoptions(snpcoding = method)
    cat("\nstorage=", storageMode, "  method=", SNPCODING_NAMES[method + 1],
    "\n")
    S0 <- system.time(G <- genomicmatrix(M))
    print(S1 <- system.time(C1 <- crossprodx(M)))
    print(S2 <- system.time(C2 <- crossprodx(G)))
    stopifnot(all(C == C1))
    stopifnot(all(C == C2))
    R1 <- S / S1 
    R2 <- S / S2 
    print(0.5 * (R1 + R2))
    print(S3 <- system.time(C3 <- relationshipMatrix(M)))
    print(S4 <- system.time(C4 <- relationshipMatrix(G)))
    R3 <- S / S3
    R4 <- S / S4 
    print(0.5 * (R3 + R4))
    stopifnot(all.equal(as.double(A), as.double(C3)))
    stopifnot(all.equal(as.double(A), as.double(C4)))
    gc()
  }
}
 | 
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |  
require(RandomFieldsUtils)
set.seed(0)
snpcodes <- c(TwoBit, ThreeBit)
if (has.instruction.set("SSE2")) snpcodes <- c(snpcodes, Hamming2)
if (has.instruction.set("SSSE3")) snpcodes <- c(snpcodes, Hamming3, Shuffle)
if (has.instruction.set("AVX2")) snpcodes <- c(snpcodes, Shuffle256)
   
Print(snpcodes)
indiv <- 1 + sample(100:500, 1)
snps <- indiv * 2^sample(1:if (interactive()) 7 else 5, 1)
RFoptions(snpcoding=sample(snpcodes, 1))
M <- matrix(ncol=indiv, sample(0:2, indiv * snps, replace=TRUE))
print(system.time(G <- genomicmatrix(M)))
print(G)  
## crossprodx vs crossprod: about 10x faster
Print(system.time(C <- crossprodx(M)))   
print(system.time(C2 <- crossprod(M)))
stopifnot(all(C == C2))
## allele_freq vs rowMeans: about equally fast
Print(system.time(af <- allele_freq(M)))
print(system.time(alleleFreq <- 0.5 * rowMeans(M)))
stopifnot(all.equal(as.double(alleleFreq), as.double(af)))
## relationshipMatrix vs crossprod: about 10x faster
Print(system.time(R <- relationshipMatrix(M)))
print(system.time(R <- relationshipMatrix(G)))
print(system.time({
  sigma2 <- 2 * sum(alleleFreq * (1 - alleleFreq))
  R2 <- crossprod(M - 2 * alleleFreq) / sigma2
}))
stopifnot(all.equal(as.double(R), as.double(R2)))
### solveRelMat vs. solve: about equally fast
tau <- 0.0001
vec <- runif(indiv)
beta <- runif(1)
Print(system.time(S <- solveRelMat(R, tau=tau, vec=vec, betahat=beta)))
print(system.time({r <- solve(R + diag(indiv) * tau, vec);
                   y <- as.vector(R %*% r + beta)}))
stopifnot(all.equal(S$rest, r))
stopifnot(all.equal(S$yhat, y))
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.