gsPEN: SummaryLasso incorporating multiple traits

Description Usage Arguments Details Value Author(s) References Examples

Description

SummaryLasso to model pleiotropy by introducing a group-Lasso type penalty, which is sensitive to select SNPs modestly associated with multiple traits.

Usage

1
2
3
4
5
6
gsPEN(summaryZ, Nvec, plinkLD, NumIter = 100, breaking = 1, numChrs = 22, 
ChrIndexBeta = 0, Init_summaryBetas = 0, Zscale = 1, RupperVal = NULL, 
tuningMatrix = NULL, penalty = c("mixLOG"), taufactor = c(1/25, 1, 10), 
llim_length = 10, subtuning = 50, Lambda_limit = c(0.5, 0.9), 
Lenlam_singleTrait = 200, dfMax = NULL, IniBeta = 0, inverseTuning = 0, 
outputAll = 0, warmStart = 1)

Arguments

summaryZ

The Z statistics of p SNPs from q GWA studies. A matrix with dimension p x q for p SNPs and q traits. The first column corresponds to the primary trait and the rest columns correspond to the secondary traits.

Nvec

A vector of length q for the sample sizes of q GWA studies.

plinkLD

.ld file of the LD calculation from plink.

NumIter

The number of maximum iteraions for the estimation procedure.

breaking

A binary (0,1) variable to check if there are some certain estimates of coefficients to diverge during the iterations. This may happen when the signs of the correlation coefficinets were estimated incorrectly. The default value is 1.

numChrs

The number of chromosomes used in the analysis. Current version of pacakge does not use this argument.

ChrIndexBeta

The chromosome index for each SNP. Current version of pacakge does not use this argument.

Init_summaryBetas

Can be used to set the initial values of the coefficients for the iterative estimation.

Zscale

A binary (0,1) variable to make the coefficients from different GWA studies with unequal sample sizes comparable. The default value is 1.

RupperVal

The maximum tolerable magnitude of the estimates of coefficients during the iterations. This is to avoid a certain estimates of coefficients to diverge during the iterations. This may happen when the signs of the correlation coefficinets were estimated incorrectly. The default value is 50 times the maximum of coeffcients from the input in absolute values.

tuningMatrix

Inputs for the tuning values of the tuning parameters. Default is null and it will be generated automatically.

penalty

Current version of pacakge does not use this argument.

taufactor

The weights to generate the tuning values for the tuning paramter "tau" and the default is c(1/25, 1, 10) times the median of the p summation of the coefficients for each SNP across q traits.

llim_length

The argument to set up the number of tuning values for lambdas between the lower and upper bound. The default value is 10.

subtuning

The argument to set up the number of tuning values for lambdas between the lower and upper bound. The default value is 50.

Lambda_limit

The quantiles to set up the tuning values of lambda. The default value is c(0.5, 0.9).

Lenlam_singleTrait

The quantiles to set up the tuning values of lambda for single trait analysis.

dfMax

The upper bound of the number of non-zero estimates of coefficients for the primary trait.

IniBeta

A binary (0,1) variable to indicate if the regression coefficients need to be initialized or not. 1 is for yes.

inverseTuning

For internal checking usage. The default value is 0.

outputAll

For internal checking usage. The default value is 0.

warmStart

For analysis with single trait or multiple traits without functional annotations, it is recommended to use warmStart = 1 to enhance computations.

Details

Note that the tuning values for the tuning parameters may need to be modified manually when the selected optimal tuning parameters are at the boundary of the inputs.

Value

BetaMatrix

The output of the coefficients matrix with dimensions (total number of combinations of the tuning values times (pq)). Each column represents the vectorization of the p x q coefficients matrix given a particular combination of the tuning values (stacking its columns into a column vector).

Numitervec

This vector shows the number of iterations to converge for each combination of the tuning values.

AllTuningMatrix

This matrix shows all combination of tuning values used in the estimation process. Its dimension is that total number of combinations of the tuning values times total number of tuning parameters.

Author(s)

Ting-Huei Chen

References

This R packages is based on the method introduced in the manuscript "A comprehensive statistical framework for building polygenic risk prediction models based on summary statistics of genome-wide association studies."

Examples

1
2
3
4
data("summaryZ")
data("Nvec")
data("plinkLD")
output = gsPEN(summaryZ=summaryZ, Nvec=Nvec, plinkLD=plinkLD)

SummaryLasso documentation built on Nov. 15, 2019, 5:10 p.m.