Description Usage Arguments Details Value Author(s) References
Implements the summary statistic method of Johnson et al. for approximating the regression of a response variable onto an additive multi-SNP genetic risk score in a given testing dataset, using only single SNP association summary statistics.
1 | grs.summary(w, b, s, n)
|
w |
coefficients for the risk score. |
b |
aligned beta coefficients in the testing dataset, of same
length as |
s |
standard errors for |
n |
sample size of testing dataset. |
The risk score coefficients w
are the “weights” used to construct
the risk score, for a set of SNPs, in chosen units per dose of the coded allele.
Typically these are single SNP regression coefficients estimated in a
“discovery” dataset.
The aligned beta coefficients b
are regression coefficients for
the response variable of interest, for the same set of SNPs and per
dose of the same coded allele as used to define w
. Typically
these are single SNP regression coefficients estimated in the
“testing” dataset. The standard errors s
are standard
errors on b
.
In applications to causal inference, a common objective is to estimate
the causal effect of an intermediate trait or biomarker, on a response
variable or outcome. In such applications, the w
are the
estimated effects on the intermediate trait or biomarker, and the
b
are estimated effects on the response variable or outcome,
with standard errors s
.
The sample size argument n
is required only to compute the
(pseudo) variance explained in the testing dataset from the likelihood
ratio test statistic.
The method for calculating the regression of the response variable onto the risk score was first used for the work of the International Consortium for Blood Pressure Genome-Wide Association Studies (2011), and described in more detail in Dastani et al. (2012). The method is exact for uncorrelated SNPs and a quadratic log-likelihood, the latter being obtained under a normal linear model, or under any regression model with a large sample size.
The heterogeneity test is a test of whether the regression coefficients for the response variable are proportional to the risk score coefficients. It is described in detail in the “ashg2012” package vignette. In applications to causal inference, firstly note that the heterogeneity test often lacks power, and hence a non-significant heterogeneity test is not evidence of clean instruments. Secondly note that poor fit may be detected either when there are pleiotropic effects, or alternatively when one or more of the coefficients used to parameterise the risk score have been estimated imprecisely or with bias, and therefore a significant heterogeneity test is not necessarily evidence of unclean instruments. Nonetheless, a significant heterogeneity test may indicate that underlying assumptions should be subjected to extra scrutiny before any inference is made about causality.
A named list with the following elements: m
is the number of
SNPs used in the risk score. n
is the input sample size.
X2m
is the chi squared test statistic for an m d.f. test in the
testing dataset (all
SNPs have independent effects). R2m
is the (pseudo) variance
explained by the m d.f. model in the testing dataset. ahat
is
the estimated coefficent for regressing the response onto the m SNP
risk score. aSE
is the standard error. X2rs
is the chi
squared test statistic for a 1 d.f. test for the risk score in the
testing dataset. R2rs
is the (pseudo) variance
explained by the risk score model in the testing dataset. pval
is the P-value for the 1 d.f. test. Qrs
is the (m-1)
d.f. heterogeneity test statistic. phet
is the heterogeneity
test P-value.
Toby Johnson Toby.x.Johnson@gsk.com
International Consortium for Blood Pressure Genome-Wide Association Studies (2011 Nature) http://dx.doi.org/10.1038/nature10405
Dastani et al. (2012 PLoS Genetics) http://dx.doi.org/10.1371/journal.pgen.1002607
Johnson (2012 ASHG poster) see “ashg2012” package vignette.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.