polygenescore: Calculate power and predictive accuracy of a polygenic score

Description Usage Arguments Details Value Author(s) References Examples

View source: R/polygenescore.R

Description

Calculates measures of association for a polygenic score derived from a training sample to predict traits in a target sample.

Usage

1
2
3
4
5
polygenescore(nsnp, n, vg1 = 0, cov12 = vg1, pi0 = 0, pupper = c(0, 1),
  nested = TRUE, weighted = TRUE, binary = c(FALSE, FALSE),
  prevalence = c(0.1, 0.1), sampling = prevalence, lambdaS = NA,
  shrinkage = FALSE, logrisk = FALSE, alpha = 0.05, r2gx = 0,
  corgx = 0, r2xy = 0, adjustedEffects = FALSE, riskthresh = 0.1)

Arguments

nsnp

Number of independent markers in the polygenic score.

n

Vector with two elements, giving the total sizes of the training and target samples. In case/control studies, n is the sum of the number of cases and number of controls. If only one element of n is given, the training and target samples are assumed to be the same size. No default - a value must be given

vg1

Proportion of variance explained by genetic effects in the training sample.

cov12

Covariance between genetic effect sizes in the two samples. If the effects are fully correlated then cov12<=sqrt(vg1). If the effects are identical then cov12=vg1 (default).

pi0

Proportion of markers with no effect on the training trait.

pupper

Vector of p-value thresholds for selecting markers from training sample. First element is the lower bound of the first interval, second element is the upper bound of the first interval, third element is the upper bound of the second interval, etc.

nested

TRUE if the p-value intervals are nested, that is they have the same lower bound, which is the first element of pupper. If false, lower bound of the second interval is the upper bound of the first and so on.

weighted

TRUE if estimated effect sizes are used as weights in forming the polygenic score. If false, an unweighted score is used, which is the sum of risk alleles carried.

binary

TRUE if the training trait is binary. By default, the target trait is binary if the training trait is; otherwise binary should be a vector with two elements for the training and target samples respectively.

prevalence

For a binary trait, prevalence in the training sample. By default, prevalence is the same in the target sample. Otherwise, prevalence should be a vector with two elements for the training and target samples respectively.

sampling

For a binary trait, case/control sampling fraction in the training sample. By default, sampling equals the prevalence, as in a cohort study. If the sampling fraction is different in the target sample, sampling should be a vector with two elements for the training and target samples respectively.

lambdaS

Sibling relative recurrence risk in training sample, can be specified instead of vg1.

shrinkage

TRUE if effect sizes are to be shrunk to BLUPs.

logrisk

TRUE if binary trait arises from log-risk model rather than liability threshold.

alpha

Significance level for testing association of the polygenic score in the target sample.

r2gx

Proportion of variance in environmental risk score explained by genetic effects in training sample.

corgx

Genetic correlation between environmental risk score and training trait.

r2xy

Proportion of variance in training trait explained by environmental risk score.

adjustedEffects

TRUE if polygenic and envrionmental scores are combined as a weighted sum. If FALSE, the scores are combined as an unweighted sum even if they are correlated.

riskthresh

Absolute risk threshold for calculating net reclassification index.

Details

The following setup is assumed. Two independent samples of genotypes are available; this could be one sample of data split into two subsets. One sample is termed the training sample, the other the target sample. Traits are measured in each sample; different traits could be measured in training and target samples. Subjects are assumed to be unrelated, and genotypes assumed to be independent. In practice we recommend LD-clumping methods, such as the –clump option in PLINK, to ensure weak dependence between markers; in this case the methods are almost unbiased if an r2 threshold of 0.1 is used. Markers with P-values within a fixed range are selected from the training sample, and then used to construct a polygenic score for each subject in the target sample. The score can be tested for association to the target trait, or used to predict individual trait values in the target sample.

Value

A list with elements containing quantities describing the association of the polygenic score with the target trait:

Author(s)

Frank Dudbridge

References

Dudbridge F (2013) Power and predictive accuracy of polygenic risk scores. PLoS Genet 9:e1003348

Dudbridge F, Pashayan N, Yang J. Predictive accuracy of combined genetic and environmental risk scores. Submitted.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# P-value for ISC schizophrenia score associated with schizophrenia in MGS-EA
# See page 3, column 2, paragraph 3 of Dudbridge (2013)
polygenescore(74062,n=c(3322+3587,2687+2656),vg1=0.269,pi0=0.99,binary=TRUE,
sampling=c(3322/6909,2687/5343),pupper=c(0,0.5),prevalence=.01)$p
# [1] 1.029771e-28

# Power for ISC schizophrenia score associated with bipolar disorder in WTCCC
# See page 4, column 2, paragraph 2 of Dudbridge (2013)
polygenescore(74062,c(3322+3587,1829+2935),vg1=0.287,cov12=0.28*0.287,binary=TRUE,
sampling=c(3322/6909,1829/4764),pupper=c(0,0.5),prevalence=.01)$power
# [1] 0.8042843

# Power for cross validation study of Framingham risk score
# See page 6, column 1, paragraph 1 of Dudbridge (2013)
polygenescore(100000,c(1575,175),vg1=1,pupper=c(0,0.1,0.2,0.3,0.4,0.5),
nested=FALSE)$power
# [1] 0.19723400 0.11733175 0.09195134 0.07733049 0.06771049

# Net reclassification index for cardiovascular disease with QRISK-2 and 53 SNPs
# See table 3, row 1, columns 5-6 of Dudbridge et al (submitted)
# results vary due to stochastic evaluation of multivariate normal probabilities
polygenescore(nsnp=1e5,n=63746+130681,vg1=0.3,pi0=0.8,binary=TRUE,
prevalence=0.15,sampling=63746/194427,pupper=c(0,5e-8),
r2gx=0.3,r2xy=0.052,corgx=0.1,riskthresh=0.1,adjustedEffects=TRUE)$NRI
# [1] -0.006042718  0.015266759  0.009224041

DudbridgeLab/AVENGEME documentation built on Oct. 17, 2019, 6:57 a.m.