Description Usage Arguments Value Note Author(s) References See Also Examples
This function computes penalized Selection Operator for JOintly analyzing multiple variants (SOJO) within a mapped locus, based on LASSO regression derived from GWAS summary statistics.
1 2 | sojo(sum.stat.discovery, sum.stat.validation = NULL, LD_ref, snp_ref,
v.y = 1, lambda.vec = NA, standardize = T, nvar = 50)
|
sum.stat.discovery |
A data frame including GWAS summary statistics of genetic variants within a mapped locus. The input data frame should include following columns: SNP, SNP ID; A1, effect allele; A2, reference allele; b, estimate of marginal effect in GWAS; se, standard error of the estimates of marginal effects in GWAS; N, sample size. |
sum.stat.validation |
A data frame including GWAS summary statistics from a validation dataset. It should include following columns: SNP, SNP ID; A1, effect allele; A2, reference allele; Freq1, the allele frequency of Allele1; b, estimate of marginal effect in GWAS; se, standard error of the estimates of marginal effects in GWAS; N, sample size. |
LD_ref |
The reference LD correlation matrix including SNPs at the locus. The row names and column names of the matrix should be SNP names in reference sample. |
snp_ref |
The reference alleles of SNPs in the reference LD correlation matrix. The names of the vector should be SNP names in reference sample. |
v.y |
The phenotypic variance of the trait. Default is 1. |
lambda.vec |
The tuning parameter sequence given by user. If not specified, the function will compute its own tuning parameter sequence ,which is recommended. |
standardize |
Logical value for genotypic data standardization, prior to starting the algorithm.
The coefficients in output are always transformed back to the original scale. Default is |
nvar |
The number of variants aiming to be selected in the model. If |
A list is returned with:
beta.opt The optimal variants and their effect sizes in terms of out of sample R^2. Only available when sum.stat.validation
is provided.
lambda.opt The optimal tuning parameter in terms of out of sample R^2. Only available when sum.stat.validation
is provided.
R2 The out of sample R^2 for each tuning parameter in lambda.v
. Only available when sum.stat.validation
is provided.
lambda.v The tuning parameter sequence actually used.
beta.mat The LASSO estimates at the tuning parameters in lambda.v
stored in sparse matrix format. The reference alleles in results are same as those in the discovery gwas results.
selected.markers The vector of selected variants. The variants being ahead are selected earlier in LASSO path.
Users can download reference LD correlation matrices from https://www.dropbox.com/home/sojo%20reference%20ld%20matrix. These LD matrices are based on 612,513 chip markers in Swedish Twin Registry. If chip markers are only a small subset of the analysis, LD matrix from the 1000 Genomes Project can be used (see the GitHub tutorial). The function will then take overlapping SNPs between summary statistics and reference LD matrix.
The function returns results along the whole LASSO path when tuning parameter changes. Users can specify several tunining parameters or how many variants should be selected.
The optimal tuning parameter can be suggested by validation. If the GWAS summary statistics from a validation dataset are provided in sum.stat.validation
, then the out of sample R^2 for each tuning parameter in lambda.v
will be computed. The tuning parameter gives the largest out of sample R^2 will be considered as optimal. The optimal tuning parameter and the variants and their effect sizes
at this tuning parameter will be reported in beta.opt
and lambda.opt
.
When a tiny lambda.vec
is specified, the LASSO solution is similar to the standard multiple regression,
which may cause error due to complete LD between variants.
Note the length of lambda.v in result may be longer than nvar
. Because a lambda will be recorded when a variant is
added into or removed from the model.
Zheng Ning
Ning Z, Lee Y, Joshi PK, Wilson JF, Pawitan Y, Shen X (2017). A selection operator for summary association statistics reveals locus-specific allelic heterogeneity of complex traits. Submitted.
sojo tutorial: https://github.com/zhenin/sojo
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ## Not run:
## The GWAS summary statistics of SNPs in 1 MB window centred at rs11090631
data(sum.stat.discovery)
head(sum.stat.discovery)
## The reference matrix and corresponding reference alleles
download.file("https://www.dropbox.com/s/ty1udfhx5ohauh8/LD_chr22.rda?raw=1", destfile = paste0(find.package('sojo'), "example.rda"))
load(file = paste0(find.package('sojo'), "example.rda"))
res <- sojo(sum.stat.discovery, LD_ref = LD_mat, snp_ref = snp_ref, nvar = 20)
## LASSO path plot
matplot(log(res$lambda.v), t(as.matrix(res$beta.mat)), lty = 1, type = "l", xlab = expression(paste(log, " ",lambda)),
ylab = "Coefficients", main = "Summary-level LASSO")
## LASSO solution for user supplied tuning parameters
res2 <- sojo(sum.stat.discovery = sum.stat.discovery, LD_ref = LD_mat, snp_ref = snp_ref, lambda.vec = c(0.004,0.002))
## LASSO solution and the optimal tuning parameter when validation dataset is available
data(sum.stat.validation)
head(sum.stat.validation)
res.valid <- sojo(sum.stat.discovery, sum.stat.validation = sum.stat.validation, LD_ref = LD_mat, snp_ref = snp_ref, nvar = 20)
res.valid$beta.opt # the optimal variants and their effect sizes
res.valid$lambda.opt # the optimal tuning parameter
res.valid$R2 # out of sample R^2
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.