knitr::opts_chunk$set(collapse = TRUE,comment = "#",fig.width = 5, fig.height = 3,fig.align = "center", fig.cap = " ",dpi = 120)
This vignette demonstrates mvsusieR in the context of prediction. We first simulate three phenotypes ($y_1$, $y_2$, $y_3$) and genotypes at $P = 1000$ loci ($X$) for $N = 500$ individuals. Then, the goal is to predict phenotypes from genotypes for $N_{test} = 100$ individuals, training our model on the remaining $N_{training} = 400$ individuals.
library(mashr) library(mvsusieR) set.seed(1234) options(stringsAsFactors = FALSE)
The first step involves simulating phenotypic and genotypic data.
N = 500 P = 1000 true_eff <- 2 X = matrix(sample(c(0, 1, 2), size=N*P, replace=T), nrow=N, ncol=P) beta1 = beta2 = beta3 = rep(0, P) beta1[1:true_eff] = runif(true_eff) beta2[1:true_eff] = runif(true_eff) beta3[1:true_eff] = runif(true_eff) y1 = X %*% beta1 + rnorm(N) y2 = X %*% beta2 + rnorm(N) y3 = X %*% beta3 + rnorm(N)
The phenotypic data was simulated assuming the same 2 causal variables affect each of the traits with different effects.
which(beta1 != 0) which(beta2 != 0) which(beta3 != 0) head(beta1) head(beta2) head(beta3)
For starters, we assume there are at most 10 causal variables (i.e., set L = 10
), select the first 100 individuals as the training set,
and derive the prior from data:
L <- 10 test_num <- 1:100 y <- cbind(y1[-test_num], y2[-test_num], y3[-test_num]) prior_covar <- create_mash_prior(sample_data = list(X=X[-test_num, ],Y=y,residual_variance=cov(y)), max_mixture_len=-1)
Then, we fit the mvsusie model to the training set.
m <- mvsusie(X[-test_num, ], y, L=L, prior_variance=prior_covar)
Finally, we use the parameter estimates from the training set to predict phenotypes in the test set. The accuracy of prediction is calculated as the correlation between true and predicted phenotypes.
pred <- predict.mvsusie(m, X[test_num, ]) results <- matrix(c(cor(pred[, 1], y1[test_num, ]), cor(pred[, 2], y2[test_num, ]), cor(pred[, 3], y3[test_num, ])), nrow=3, ncol=1) results
Here are some details about the computing environment, including the versions of R, and the R packages, used to generate these results.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.