polygenescore: Calculate power and predictive accuracy of a polygenic score
In DudbridgeLab/AVENGEME: Analysis of polygenic scoring methods

Description Usage Arguments Details Value Author(s) References Examples

View source: R/polygenescore.R

Calculates measures of association for a polygenic score derived from a training sample to predict traits in a target sample.

polygenescore(nsnp, n, vg1 = 0, cov12 = vg1, pi0 = 0, pupper = c(0, 1),
  nested = TRUE, weighted = TRUE, binary = c(FALSE, FALSE),
  prevalence = c(0.1, 0.1), sampling = prevalence, lambdaS = NA,
  shrinkage = FALSE, logrisk = FALSE, alpha = 0.05, r2gx = 0,
  corgx = 0, r2xy = 0, adjustedEffects = FALSE, riskthresh = 0.1)

`nsnp`	Number of independent markers in the polygenic score.
`n`	Vector with two elements, giving the total sizes of the training and target samples. In case/control studies, n is the sum of the number of cases and number of controls. If only one element of n is given, the training and target samples are assumed to be the same size. No default - a value must be given
`vg1`	Proportion of variance explained by genetic effects in the training sample.
`cov12`	Covariance between genetic effect sizes in the two samples. If the effects are fully correlated then cov12<=sqrt(vg1). If the effects are identical then cov12=vg1 (default).
`pi0`	Proportion of markers with no effect on the training trait.
`pupper`	Vector of p-value thresholds for selecting markers from training sample. First element is the lower bound of the first interval, second element is the upper bound of the first interval, third element is the upper bound of the second interval, etc.
`nested`	TRUE if the p-value intervals are nested, that is they have the same lower bound, which is the first element of pupper. If false, lower bound of the second interval is the upper bound of the first and so on.
`weighted`	TRUE if estimated effect sizes are used as weights in forming the polygenic score. If false, an unweighted score is used, which is the sum of risk alleles carried.
`binary`	TRUE if the training trait is binary. By default, the target trait is binary if the training trait is; otherwise binary should be a vector with two elements for the training and target samples respectively.
`prevalence`	For a binary trait, prevalence in the training sample. By default, prevalence is the same in the target sample. Otherwise, prevalence should be a vector with two elements for the training and target samples respectively.
`sampling`	For a binary trait, case/control sampling fraction in the training sample. By default, sampling equals the prevalence, as in a cohort study. If the sampling fraction is different in the target sample, sampling should be a vector with two elements for the training and target samples respectively.
`lambdaS`	Sibling relative recurrence risk in training sample, can be specified instead of vg1.
`shrinkage`	TRUE if effect sizes are to be shrunk to BLUPs.
`logrisk`	TRUE if binary trait arises from log-risk model rather than liability threshold.
`alpha`	Significance level for testing association of the polygenic score in the target sample.
`r2gx`	Proportion of variance in environmental risk score explained by genetic effects in training sample.
`corgx`	Genetic correlation between environmental risk score and training trait.
`r2xy`	Proportion of variance in training trait explained by environmental risk score.
`adjustedEffects`	TRUE if polygenic and envrionmental scores are combined as a weighted sum. If FALSE, the scores are combined as an unweighted sum even if they are correlated.
`riskthresh`	Absolute risk threshold for calculating net reclassification index.

The following setup is assumed. Two independent samples of genotypes are available; this could be one sample of data split into two subsets. One sample is termed the training sample, the other the target sample. Traits are measured in each sample; different traits could be measured in training and target samples. Subjects are assumed to be unrelated, and genotypes assumed to be independent. In practice we recommend LD-clumping methods, such as the –clump option in PLINK, to ensure weak dependence between markers; in this case the methods are almost unbiased if an r2 threshold of 0.1 is used. Markers with P-values within a fixed range are selected from the training sample, and then used to construct a polygenic score for each subject in the target sample. The score can be tested for association to the target trait, or used to predict individual trait values in the target sample.

A list with elements containing quantities describing the association of the polygenic score with the target trait:

R2 Squared correlation between polygenic score and target trait.
NCP Non-centrality parameter of the chisq test of association between polygenic score and target trait.
p Expected P-value of the chisq test of association between polygenic score and target trait.
power Power of the chisq test of association between polygenic score and target trait.
FDR Expected proportion of false positives among selected markers.
AUC For binary traits, area under ROC curve.
MSE For quantitative traits, mean square error between target trait and polygenic score.
NRI Net reclassification improvement in cases, controls, and combined.
IDI Integrated discrimination improvement.
error Error message, if any.

Frank Dudbridge

Dudbridge F (2013) Power and predictive accuracy of polygenic risk scores. PLoS Genet 9:e1003348

Dudbridge F, Pashayan N, Yang J. Predictive accuracy of combined genetic and environmental risk scores. Submitted.

# P-value for ISC schizophrenia score associated with schizophrenia in MGS-EA
# See page 3, column 2, paragraph 3 of Dudbridge (2013)
polygenescore(74062,n=c(3322+3587,2687+2656),vg1=0.269,pi0=0.99,binary=TRUE,
sampling=c(3322/6909,2687/5343),pupper=c(0,0.5),prevalence=.01)$p
# [1] 1.029771e-28

# Power for ISC schizophrenia score associated with bipolar disorder in WTCCC
# See page 4, column 2, paragraph 2 of Dudbridge (2013)
polygenescore(74062,c(3322+3587,1829+2935),vg1=0.287,cov12=0.28*0.287,binary=TRUE,
sampling=c(3322/6909,1829/4764),pupper=c(0,0.5),prevalence=.01)$power
# [1] 0.8042843

# Power for cross validation study of Framingham risk score
# See page 6, column 1, paragraph 1 of Dudbridge (2013)
polygenescore(100000,c(1575,175),vg1=1,pupper=c(0,0.1,0.2,0.3,0.4,0.5),
nested=FALSE)$power
# [1] 0.19723400 0.11733175 0.09195134 0.07733049 0.06771049

# Net reclassification index for cardiovascular disease with QRISK-2 and 53 SNPs
# See table 3, row 1, columns 5-6 of Dudbridge et al (submitted)
# results vary due to stochastic evaluation of multivariate normal probabilities
polygenescore(nsnp=1e5,n=63746+130681,vg1=0.3,pi0=0.8,binary=TRUE,
prevalence=0.15,sampling=63746/194427,pupper=c(0,5e-8),
r2gx=0.3,r2xy=0.052,corgx=0.1,riskthresh=0.1,adjustedEffects=TRUE)$NRI
# [1] -0.006042718  0.015266759  0.009224041