sojo: Selection Operator for Jointly analyzing multiple variants...

Description Usage Arguments Value Note Author(s) References See Also Examples

Description

This function computes penalized Selection Operator for JOintly analyzing multiple variants (SOJO) within a mapped locus, based on LASSO regression derived from GWAS summary statistics.

Usage

1
2
sojo(sum.stat.discovery, sum.stat.validation = NULL, LD_ref, snp_ref,
  v.y = 1, lambda.vec = NA, standardize = T, nvar = 50)

Arguments

sum.stat.discovery

A data frame including GWAS summary statistics of genetic variants within a mapped locus. The input data frame should include following columns: SNP, SNP ID; A1, effect allele; A2, reference allele; b, estimate of marginal effect in GWAS; se, standard error of the estimates of marginal effects in GWAS; N, sample size.

sum.stat.validation

A data frame including GWAS summary statistics from a validation dataset. It should include following columns: SNP, SNP ID; A1, effect allele; A2, reference allele; Freq1, the allele frequency of Allele1; b, estimate of marginal effect in GWAS; se, standard error of the estimates of marginal effects in GWAS; N, sample size.

LD_ref

The reference LD correlation matrix including SNPs at the locus. The row names and column names of the matrix should be SNP names in reference sample.

snp_ref

The reference alleles of SNPs in the reference LD correlation matrix. The names of the vector should be SNP names in reference sample.

v.y

The phenotypic variance of the trait. Default is 1.

lambda.vec

The tuning parameter sequence given by user. If not specified, the function will compute its own tuning parameter sequence ,which is recommended.

standardize

Logical value for genotypic data standardization, prior to starting the algorithm. The coefficients in output are always transformed back to the original scale. Default is standardize = TRUE.

nvar

The number of variants aiming to be selected in the model. If sum.stat.validation is provided, nvar is the maximum number of variants in the model. For example, if nvar = 5, then the algorithm will stop before the sixth variant is selected. Default is 50.

Value

A list is returned with:

Note

Users can download reference LD correlation matrices from https://www.dropbox.com/home/sojo%20reference%20ld%20matrix. These LD matrices are based on 612,513 chip markers in Swedish Twin Registry. If chip markers are only a small subset of the analysis, LD matrix from the 1000 Genomes Project can be used (see the GitHub tutorial). The function will then take overlapping SNPs between summary statistics and reference LD matrix.

The function returns results along the whole LASSO path when tuning parameter changes. Users can specify several tunining parameters or how many variants should be selected.

The optimal tuning parameter can be suggested by validation. If the GWAS summary statistics from a validation dataset are provided in sum.stat.validation, then the out of sample R^2 for each tuning parameter in lambda.v will be computed. The tuning parameter gives the largest out of sample R^2 will be considered as optimal. The optimal tuning parameter and the variants and their effect sizes at this tuning parameter will be reported in beta.opt and lambda.opt.

When a tiny lambda.vec is specified, the LASSO solution is similar to the standard multiple regression, which may cause error due to complete LD between variants.

Note the length of lambda.v in result may be longer than nvar. Because a lambda will be recorded when a variant is added into or removed from the model.

Author(s)

Zheng Ning

References

Ning Z, Lee Y, Joshi PK, Wilson JF, Pawitan Y, Shen X (2017). A selection operator for summary association statistics reveals locus-specific allelic heterogeneity of complex traits. Submitted.

See Also

sojo tutorial: https://github.com/zhenin/sojo

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
## Not run: 
## The GWAS summary statistics of SNPs in 1 MB window centred at rs11090631 
data(sum.stat.discovery)
head(sum.stat.discovery)

## The reference matrix and corresponding reference alleles 
download.file("https://www.dropbox.com/s/ty1udfhx5ohauh8/LD_chr22.rda?raw=1", destfile = paste0(find.package('sojo'), "example.rda"))
load(file = paste0(find.package('sojo'), "example.rda"))

res <- sojo(sum.stat.discovery, LD_ref = LD_mat, snp_ref = snp_ref, nvar = 20)

## LASSO path plot
matplot(log(res$lambda.v), t(as.matrix(res$beta.mat)), lty = 1, type = "l", xlab = expression(paste(log, " ",lambda)), 
ylab = "Coefficients", main = "Summary-level LASSO")

## LASSO solution for user supplied tuning parameters
res2 <- sojo(sum.stat.discovery = sum.stat.discovery, LD_ref = LD_mat, snp_ref = snp_ref, lambda.vec = c(0.004,0.002))


## LASSO solution and the optimal tuning parameter when validation dataset is available
data(sum.stat.validation)
head(sum.stat.validation)

res.valid <- sojo(sum.stat.discovery, sum.stat.validation = sum.stat.validation, LD_ref = LD_mat, snp_ref = snp_ref, nvar = 20)
res.valid$beta.opt  # the optimal variants and their effect sizes
res.valid$lambda.opt  # the optimal tuning parameter
res.valid$R2  # out of sample R^2

## End(Not run)

sojo documentation built on May 2, 2019, 5:52 p.m.

Related to sojo in sojo...