Description Usage Arguments Value Note Author(s) References See Also Examples
This function computes penalized Selection Operator for JOintly analyzing multiple variants (SOJO) within a mapped locus, based on LASSO regression derived from GWAS summary statistics.
1 2 | sojo(sum.stat.raw, LD_ref, snp_ref, v.y = 1, lambda.vec = NA,
standardize = T, nvar = 50)
|
sum.stat.raw |
A data frame including GWAS summary statistics of genetic variants within a mapped locus. The input data frame should include following columns: SNP, SNP ID; A1, effect allele; A2, reference allele; Freq1, the allele frequency of Allele1; b, estimate of marginal effect in GWAS; se, standard error of the estimates of marginal effects in GWAS; N, sample size. |
LD_ref |
The reference LD correlation matrix including SNPs at the locus. The row names and column names of the matrix should be SNP names in reference sample. |
snp_ref |
The reference alleles of SNPs in the reference LD correlation matrix. The names of the vector should be SNP names in reference sample. |
v.y |
The phenotypic variance of the trait. Default is 1. |
lambda.vec |
The tuning parameter sequence given by user. If not specified, the function will compute its own tuning parameter sequence ,which is recommended. |
standardize |
Logical value for genotypic data standardization, prior to starting the algorithm.
The coefficients in output are always transformed back to the original scale. Default is |
nvar |
The number of variants aiming to be selected in the model.
For example, if |
A list is returned with:
lambda.v The tuning parameter sequence actually used.
beta.mat The LASSO estimates at the tuning parameters in lambda.v
stored in sparse matrix format.
Users can download reference LD correlation matrices from https://www.dropbox.com/home/sojo%20reference%20ld%20matrix. These LD matrices are based on 612,513 chip markers in Swedish Twin Registry. The function will then take overlapping SNPs between summary statistics and reference LD matrix.
When a tiny lambda.vec
is specified, the LASSO solution is similar to the standard multiple regression,
which may cause error due to complete LD between variants.
Note the length of lambda.v in result may be longer than nvar
. Because a lambda will be recorded when a variant is
added into or removed from the model.
Zheng Ning
Ning Z, Lee Y, Joshi PK, Wilson JF, Pawitan Y, Shen X (2017). A selection operator for summary association statistics reveals locus-specific allelic heterogeneity of complex traits. Submitted.
sojo tutorial: https://github.com/zhenin/sojo
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ## Not run:
## The GWAS summary statistics of SNPs in 1 MB window centred at rs11090631
data(sum.stat.raw)
head(sum.stat.raw)
## The reference matrix and corresponding reference alleles
download.file("https://www.dropbox.com/s/ty1udfhx5ohauh8/LD_chr22.rda?raw=1", destfile = paste0(find.package('sojo'), "example.rda"))
load(file = paste0(find.package('sojo'), "example.rda"))
res <- sojo(sum.stat.raw, LD_ref = LD_mat, snp_ref = snp_ref, nvar = 20)
## LASSO path plot
matplot(log(res$lambda.v), t(as.matrix(res$beta.mat)), lty = 1, type = "l", xlab = expression(paste(log, " ",lambda)),
ylab = "Coefficients", main = "Summary-level LASSO")
## LASSO solution for user supplied tuning parameters
res2 <- sojo(sum.stat.raw = sum.stat.raw, LD_ref = LD_mat, snp_ref = snp_ref, lambda.vec = c(0.004,0.002))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.