sampleSizeForGeneScore: Sample size calculations for polygenic scores

Description Usage Arguments Details Value Author(s) References Examples

View source: R/sampleSizeForGeneScore.R

Description

Calculates the size of training sample to achieve a given AUC, R2 or power in the target sample.

Usage

1
2
3
4
5
sampleSizeForGeneScore(targetQuantity, targetValue, nsnp, n2 = NA, vg1 = 0,
  cov12 = vg1, pi0 = 0, weighted = TRUE, binary = FALSE,
  prevalence = 0.1, sampling = prevalence, lambdaS = NA,
  shrinkage = FALSE, logrisk = FALSE, alpha = 0.05, r2gx = 0,
  corgx = 0, r2xy = 0, adjustedEffects = FALSE)

Arguments

targetQuantity

Either "AUC", "R2" or "power" (case insensitive).

targetValue

The value of the targetQuantity for which to calculate sample size.

nsnp

Number of independent markers in the polygenic score.

n2

Target sample size. Only relevant when targetQuantity is "power". By default set equal to the training sample size.

vg1

Proportion of variance explained by genetic effects in the training sample.

cov12

Covariance between genetic effect sizes in the two samples. If the effects are fully correlated then cov12<=sqrt(vg1). If the effects are identical then cov12=vg1 (default).

pi0

Proportion of markers with no effect on the training trait.

weighted

TRUE if estimated effect sizes are used as weights in forming the polygenic score. If false, an unweighted score is used, which is the sum of risk alleles carried.

binary

TRUE if the training trait is binary. By default, the target trait is binary if the training trait is; otherwise binary should be a vector with two elements for the training and target samples respectively.

prevalence

For a binary trait, prevalence in the training sample. By default, prevalence is the same in the target sample. Otherwise, prevalence should be a vector with two elements for the training and target samples respectively.

sampling

For a binary trait, case/control sampling fraction in the training sample. By default, sampling equals the prevalence, as in a cohort study. If the sampling fraction is different in the target sample, sampling should be a vector with two elements for the training and target samples respectively.

lambdaS

Sibling relative recurrence risk in training sample, can be specified instead of vg1.

shrinkage

TRUE if effect sizes are to be shrunk to BLUPs.

logrisk

TRUE if binary trait arises from log-risk model rather than liability threshold.

alpha

Significance level for testing association of the polygenic score in the target sample.

r2gx

Proportion of variance in environmental risk score explained by genetic effects in training sample.

corgx

Genetic correlation between environmental risk score and training trait.

r2xy

Proportion of variance in training trait explained by environmental risk score.

adjustedEffects

TRUE if polygenic and envrionmental scores are combined as a weighted sum. If FALSE, the scores are combined as an unweighted sum even if they are correlated.

Details

The sample size is estimated by numerical optimisation. For each possible sample size, the P-value threshold is identified for selecting markers into the polygenic score, such that targetQuantity is maximised.

Value

A list with the following elements:

Author(s)

Frank Dudbridge

References

Dudbridge F (2013) Power and predictive accuracy of polygenic risk scores. PLoS Genet 9:e1003348

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# AUC= 0.75 in breast cancer.  See Table 4, row 4, column 3 in Dudbridge (2013).
sampleSizeForGeneScore("AUC",0.75,nsnp=100000,vg1=0.44/2,pi0=0.90,binary=TRUE,
prevalence=0.036,sampling=0.5)
# $n
# [1] 313981.4
#
# $p
# [1] 0.007500909
#
# $max
# [1] 0.788842
#
# Number of cases
sampleSizeForGeneScore("AUC",0.75,nsnp=100000,vg1=0.44/2,pi0=0.90,binary=TRUE,
prevalence=0.036,sampling=0.5)$n/2
# [1] 156990.7

DudbridgeLab/AVENGEME documentation built on Oct. 17, 2019, 6:57 a.m.