grs.summary: Genetic risk score calculation from summary statistics.
In gtx: Genetics ToolboX

Description Usage Arguments Details Value Author(s) References

Implements the summary statistic method of Johnson et al. for approximating the regression of a response variable onto an additive multi-SNP genetic risk score in a given testing dataset, using only single SNP association summary statistics.

1	grs.summary(w, b, s, n)

`w`	coefficients for the risk score.
`b`	aligned beta coefficients in the testing dataset, of same length as `w`.
`s`	standard errors for `b`, of same length as `w` and `b`.
`n`	sample size of testing dataset.

The risk score coefficients w are the “weights” used to construct the risk score, for a set of SNPs, in chosen units per dose of the coded allele. Typically these are single SNP regression coefficients estimated in a “discovery” dataset.

The aligned beta coefficients b are regression coefficients for the response variable of interest, for the same set of SNPs and per dose of the same coded allele as used to define w. Typically these are single SNP regression coefficients estimated in the “testing” dataset. The standard errors s are standard errors on b.

In applications to causal inference, a common objective is to estimate the causal effect of an intermediate trait or biomarker, on a response variable or outcome. In such applications, the w are the estimated effects on the intermediate trait or biomarker, and the b are estimated effects on the response variable or outcome, with standard errors s.

The sample size argument n is required only to compute the (pseudo) variance explained in the testing dataset from the likelihood ratio test statistic.

The method for calculating the regression of the response variable onto the risk score was first used for the work of the International Consortium for Blood Pressure Genome-Wide Association Studies (2011), and described in more detail in Dastani et al. (2012). The method is exact for uncorrelated SNPs and a quadratic log-likelihood, the latter being obtained under a normal linear model, or under any regression model with a large sample size.

The heterogeneity test is a test of whether the regression coefficients for the response variable are proportional to the risk score coefficients. It is described in detail in the “ashg2012” package vignette. In applications to causal inference, firstly note that the heterogeneity test often lacks power, and hence a non-significant heterogeneity test is not evidence of clean instruments. Secondly note that poor fit may be detected either when there are pleiotropic effects, or alternatively when one or more of the coefficients used to parameterise the risk score have been estimated imprecisely or with bias, and therefore a significant heterogeneity test is not necessarily evidence of unclean instruments. Nonetheless, a significant heterogeneity test may indicate that underlying assumptions should be subjected to extra scrutiny before any inference is made about causality.

A named list with the following elements: m is the number of SNPs used in the risk score. n is the input sample size. X2m is the chi squared test statistic for an m d.f. test in the testing dataset (all SNPs have independent effects). R2m is the (pseudo) variance explained by the m d.f. model in the testing dataset. ahat is the estimated coefficent for regressing the response onto the m SNP risk score. aSE is the standard error. X2rs is the chi squared test statistic for a 1 d.f. test for the risk score in the testing dataset. R2rs is the (pseudo) variance explained by the risk score model in the testing dataset. pval is the P-value for the 1 d.f. test. Qrs is the (m-1) d.f. heterogeneity test statistic. phet is the heterogeneity test P-value.