sojo: Selection Operator for Jointly analyzing multiple variants...
In zhenin/sojo: Selection Operator for Jointly analyzing multiple variants (SOJO)

Description Usage Arguments Value Note Author(s) References See Also Examples

View source: R/sojo.R

This function computes penalized Selection Operator for JOintly analyzing multiple variants (SOJO) within a mapped locus, based on LASSO regression derived from GWAS summary statistics.

1 2	sojo(sum.stat.raw, LD_ref, snp_ref, v.y = 1, lambda.vec = NA, standardize = T, nvar = 50)

`sum.stat.raw`	A data frame including GWAS summary statistics of genetic variants within a mapped locus. The input data frame should include following columns: SNP, SNP ID; A1, effect allele; A2, reference allele; Freq1, the allele frequency of Allele1; b, estimate of marginal effect in GWAS; se, standard error of the estimates of marginal effects in GWAS; N, sample size.
`LD_ref`	The reference LD correlation matrix including SNPs at the locus. The row names and column names of the matrix should be SNP names in reference sample.
`snp_ref`	The reference alleles of SNPs in the reference LD correlation matrix. The names of the vector should be SNP names in reference sample.
`v.y`	The phenotypic variance of the trait. Default is 1.
`lambda.vec`	The tuning parameter sequence given by user. If not specified, the function will compute its own tuning parameter sequence ,which is recommended.
`standardize`	Logical value for genotypic data standardization, prior to starting the algorithm. The coefficients in output are always transformed back to the original scale. Default is `standardize = TRUE`.
`nvar`	The number of variants aiming to be selected in the model. For example, if `nvar = 5`, then the algorithm will stop before the sixth variant is selected. Default is 50.

A list is returned with:

lambda.v The tuning parameter sequence actually used.
beta.mat The LASSO estimates at the tuning parameters in lambda.v stored in sparse matrix format.

Users can download reference LD correlation matrices from https://www.dropbox.com/home/sojo%20reference%20ld%20matrix. These LD matrices are based on 612,513 chip markers in Swedish Twin Registry. The function will then take overlapping SNPs between summary statistics and reference LD matrix.

When a tiny lambda.vec is specified, the LASSO solution is similar to the standard multiple regression, which may cause error due to complete LD between variants.

Note the length of lambda.v in result may be longer than nvar. Because a lambda will be recorded when a variant is added into or removed from the model.

Zheng Ning

Ning Z, Lee Y, Joshi PK, Wilson JF, Pawitan Y, Shen X (2017). A selection operator for summary association statistics reveals locus-specific allelic heterogeneity of complex traits. Submitted.

sojo tutorial: https://github.com/zhenin/sojo

## Not run: 
## The GWAS summary statistics of SNPs in 1 MB window centred at rs11090631 
data(sum.stat.raw)
head(sum.stat.raw)

## The reference matrix and corresponding reference alleles 
download.file("https://www.dropbox.com/s/ty1udfhx5ohauh8/LD_chr22.rda?raw=1", destfile = paste0(find.package('sojo'), "example.rda"))
load(file = paste0(find.package('sojo'), "example.rda"))

res <- sojo(sum.stat.raw, LD_ref = LD_mat, snp_ref = snp_ref, nvar = 20)

## LASSO path plot
matplot(log(res$lambda.v), t(as.matrix(res$beta.mat)), lty = 1, type = "l", xlab = expression(paste(log, " ",lambda)), 
ylab = "Coefficients", main = "Summary-level LASSO")

## LASSO solution for user supplied tuning parameters
res2 <- sojo(sum.stat.raw = sum.stat.raw, LD_ref = LD_mat, snp_ref = snp_ref, lambda.vec = c(0.004,0.002))

## End(Not run)