knitr::opts_chunk$set(collapse = TRUE,comment = "#",fig.width = 5,
                      fig.height = 3,fig.align = "center",
                      fig.cap = " ",dpi = 120)

This vignette demonstrates mvsusieR in the context of prediction. We first simulate three phenotypes ($y_1$, $y_2$, $y_3$) and genotypes at $P = 1000$ loci ($X$) for $N = 500$ individuals. Then, the goal is to predict phenotypes from genotypes for $N_{test} = 100$ individuals, training our model on the remaining $N_{training} = 400$ individuals.

library(mashr)
library(mvsusieR)
set.seed(1234)
options(stringsAsFactors = FALSE)

The data-set

The first step involves simulating phenotypic and genotypic data.

N = 500
P = 1000
true_eff <- 2
X = matrix(sample(c(0, 1, 2), size=N*P, replace=T), nrow=N, ncol=P)
beta1 = beta2 = beta3 = rep(0, P)
beta1[1:true_eff] = runif(true_eff)
beta2[1:true_eff] = runif(true_eff)
beta3[1:true_eff] = runif(true_eff)
y1 = X %*% beta1 + rnorm(N)
y2 = X %*% beta2 + rnorm(N)
y3 = X %*% beta3 + rnorm(N)

The phenotypic data was simulated assuming the same 2 causal variables affect each of the traits with different effects.

which(beta1 != 0)
which(beta2 != 0)
which(beta3 != 0)

head(beta1)
head(beta2)
head(beta3)

Prediction with mvsusieR

For starters, we assume there are at most 10 causal variables (i.e., set L = 10), select the first 100 individuals as the training set, and derive the prior from data:

L <- 10
test_num <- 1:100
y <- cbind(y1[-test_num], y2[-test_num], y3[-test_num])
prior_covar <- create_mash_prior(sample_data = list(X=X[-test_num, ],Y=y,residual_variance=cov(y)), max_mixture_len=-1)

Then, we fit the mvsusie model to the training set.

m <- mvsusie(X[-test_num, ], y, L=L, prior_variance=prior_covar)

Finally, we use the parameter estimates from the training set to predict phenotypes in the test set. The accuracy of prediction is calculated as the correlation between true and predicted phenotypes.

pred <- predict.mvsusie(m, X[test_num, ])

results <- matrix(c(cor(pred[, 1], y1[test_num, ]), cor(pred[, 2], y2[test_num, ]), cor(pred[, 3], y3[test_num, ])), nrow=3, ncol=1)
results

Session information

Here are some details about the computing environment, including the versions of R, and the R packages, used to generate these results.

sessionInfo()


gaow/mmbr documentation built on April 24, 2024, 7:12 p.m.