Description Usage Arguments Details Value Benchmarks Examples
relationshipMatrix
calculates the relationship matrix
A=(M-P)^T (M-P) /σ^2 from the SNP matrix M
where P = p (1,…,1) with p = M \%*\% (1,…,1)^T / n.
Furthermore, sigma^2 equals
σ^2 = p^T (1 - p/2)\in[0,∞).
crossprodx
calculates the cross-product of SNPxIndiv
,
i.e. it is identical to call relationshipMatrix
with
optional argument,
centered=FALSE
, cf. RFoptions
allele_freq
calculates p/2.
SNPeffect
calculates M (A + τ I)^{-1} v
solveRelMat
calculates
(A + τ I)^{-1} v
and
A(A + τ I)^{-1} v + β
where A is the relationship matrix.
1 2 3 4 5 6 | relationshipMatrix(SNPxIndiv, ...)
crossprodx(SNPxIndiv)
solveRelMat(A, tau, vec, betahat=NULL, destroy_A=FALSE)
SNPeffect(SNPxIndiv, vec, centered=TRUE, tau=0)
allele_freq(SNPxIndiv)
|
SNPxIndiv |
0,1,2-valued (snps \times indiv)
matrix or the result of |
... |
see
|
centered |
if |
A |
a symmetric, positive definite matrix, which is a relationship matrix |
tau |
non-negative scalar |
vec |
the vector v |
betahat |
scalar or |
destroy_A |
logical. If |
Let p = M \%*\% (1,…,1)^T / n where n is the number of individuals. Then, the matrix P equals P = p (1,…,1).
The constant sigma^2 equals σ^2 = p^T (1 - p/2).
solveRelMat
has a speed and memory advantage in comparison to the
direct implementation of the above formulae.
relationsshipMatrix
returns a (Indiv \times Indiv)
numerical matrix.
The return value of solveRelMat
depends on betahat
.
If the latter is NULL
, only the vector (A + τ I)^{-1} v
is returned. Else, a list of 2 elements is returned. First element equals
the vector
(A + τ I)^{-1} v,
the second element equals
A(A + τ I)^{-1} v + β.
Computing times for the relationship matrix in comparison to 'crossprod'
in standard implementation on Intel(R) Core(TM) i7-8550U CPU @
1.80GHz, R version 3.6.0 (Linux) with indiv = 1000
and
snps = 5e5
are:
Shuffle256 : 48 x faster (AVX2; 16x compressed)
Packed256 : 36 x faster (AVX2; 16x compressed)
Shuffle : 35 x faster (SSSE3; 16x compressed)
Multiply256 : 29 x faster (AVX2; 16x compressed)
Packed : 28 x faster (SSE2; 16x compressed)
Hamming2 : 24 x faster (SSE2; 4x compressed)
Hamming3 : 21 x faster (SSSE3; 4x compressed)
Multiply : 17 x faster (SSE2; 16x compressed)
ThreeBit : 17 x faster (uint64_t; 10x compressed)
TwoBit : 15 x faster (uint64_t; 16x compressed)
NoSNPcoding : 4 x faster (int, AVX2; not compressed)
NoSNPcodingAVX: 2 x faster (double, AVX; not compressed)
NoSNPcodingR : calls crossprod
In parantheses, first the instruction set or s the main data type is given, then the data compression with respect to 32 bit integer.
The following code was used:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | RFoptions(cores = 1)
indiv <- 1000
snps <- 5e5 ## may cause memory allocation problems in R; better use 5e4 !!
methods <- c(NoSNPcodingR, NoSNPcodingAVX,
FirstGenuineMethod:LastGenuineMethod)
M <- matrix(ncol=indiv, sample(0:2, indiv * snps, replace=TRUE))
for (storageMode in c("integer", "double")){
storage.mode(M) <- storageMode
cat("\n\n")
print(S <- system.time(C <- crossprod(M)))
p <- rowMeans(M)
P <- p %*% t(rep(1, indiv))
sigma2 <- sum(p * (1- p/2))
A <- crossprod(M-P) / sigma2
print(S <- system.time(C <- crossprod(M)))
for (method in methods) {
RFoptions(snpcoding = method)
cat("\nstorage=", storageMode, " method=", SNPCODING_NAMES[method + 1],
"\n")
S0 <- system.time(G <- genomicmatrix(M))
print(S1 <- system.time(C1 <- crossprodx(M)))
print(S2 <- system.time(C2 <- crossprodx(G)))
stopifnot(all(C == C1))
stopifnot(all(C == C2))
R1 <- S / S1
R2 <- S / S2
print(0.5 * (R1 + R2))
print(S3 <- system.time(C3 <- relationshipMatrix(M)))
print(S4 <- system.time(C4 <- relationshipMatrix(G)))
R3 <- S / S3
R4 <- S / S4
print(0.5 * (R3 + R4))
stopifnot(all.equal(as.double(A), as.double(C3)))
stopifnot(all.equal(as.double(A), as.double(C4)))
gc()
}
}
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
require(RandomFieldsUtils)
set.seed(0)
snpcodes <- c(TwoBit, ThreeBit)
if (has.instruction.set("SSE2")) snpcodes <- c(snpcodes, Hamming2)
if (has.instruction.set("SSSE3")) snpcodes <- c(snpcodes, Hamming3, Shuffle)
if (has.instruction.set("AVX2")) snpcodes <- c(snpcodes, Shuffle256)
Print(snpcodes)
indiv <- 1 + sample(100:500, 1)
snps <- indiv * 2^sample(1:if (interactive()) 7 else 5, 1)
RFoptions(snpcoding=sample(snpcodes, 1))
M <- matrix(ncol=indiv, sample(0:2, indiv * snps, replace=TRUE))
print(system.time(G <- genomicmatrix(M)))
print(G)
## crossprodx vs crossprod: about 10x faster
Print(system.time(C <- crossprodx(M)))
print(system.time(C2 <- crossprod(M)))
stopifnot(all(C == C2))
## allele_freq vs rowMeans: about equally fast
Print(system.time(af <- allele_freq(M)))
print(system.time(alleleFreq <- 0.5 * rowMeans(M)))
stopifnot(all.equal(as.double(alleleFreq), as.double(af)))
## relationshipMatrix vs crossprod: about 10x faster
Print(system.time(R <- relationshipMatrix(M)))
print(system.time(R <- relationshipMatrix(G)))
print(system.time({
sigma2 <- 2 * sum(alleleFreq * (1 - alleleFreq))
R2 <- crossprod(M - 2 * alleleFreq) / sigma2
}))
stopifnot(all.equal(as.double(R), as.double(R2)))
### solveRelMat vs. solve: about equally fast
tau <- 0.0001
vec <- runif(indiv)
beta <- runif(1)
Print(system.time(S <- solveRelMat(R, tau=tau, vec=vec, betahat=beta)))
print(system.time({r <- solve(R + diag(indiv) * tau, vec);
y <- as.vector(R %*% r + beta)}))
stopifnot(all.equal(S$rest, r))
stopifnot(all.equal(S$yhat, y))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.