sampleSizeForGeneScore: Sample size calculations for polygenic scores
In DudbridgeLab/AVENGEME: Analysis of polygenic scoring methods

Description Usage Arguments Details Value Author(s) References Examples

View source: R/sampleSizeForGeneScore.R

Calculates the size of training sample to achieve a given AUC, R2 or power in the target sample.

sampleSizeForGeneScore(targetQuantity, targetValue, nsnp, n2 = NA, vg1 = 0,
  cov12 = vg1, pi0 = 0, weighted = TRUE, binary = FALSE,
  prevalence = 0.1, sampling = prevalence, lambdaS = NA,
  shrinkage = FALSE, logrisk = FALSE, alpha = 0.05, r2gx = 0,
  corgx = 0, r2xy = 0, adjustedEffects = FALSE)

`targetQuantity`	Either "AUC", "R2" or "power" (case insensitive).
`targetValue`	The value of the targetQuantity for which to calculate sample size.
`nsnp`	Number of independent markers in the polygenic score.
`n2`	Target sample size. Only relevant when targetQuantity is "power". By default set equal to the training sample size.
`vg1`	Proportion of variance explained by genetic effects in the training sample.
`cov12`	Covariance between genetic effect sizes in the two samples. If the effects are fully correlated then cov12<=sqrt(vg1). If the effects are identical then cov12=vg1 (default).
`pi0`	Proportion of markers with no effect on the training trait.
`weighted`	TRUE if estimated effect sizes are used as weights in forming the polygenic score. If false, an unweighted score is used, which is the sum of risk alleles carried.
`binary`	TRUE if the training trait is binary. By default, the target trait is binary if the training trait is; otherwise binary should be a vector with two elements for the training and target samples respectively.
`prevalence`	For a binary trait, prevalence in the training sample. By default, prevalence is the same in the target sample. Otherwise, prevalence should be a vector with two elements for the training and target samples respectively.
`sampling`	For a binary trait, case/control sampling fraction in the training sample. By default, sampling equals the prevalence, as in a cohort study. If the sampling fraction is different in the target sample, sampling should be a vector with two elements for the training and target samples respectively.
`lambdaS`	Sibling relative recurrence risk in training sample, can be specified instead of vg1.
`shrinkage`	TRUE if effect sizes are to be shrunk to BLUPs.
`logrisk`	TRUE if binary trait arises from log-risk model rather than liability threshold.
`alpha`	Significance level for testing association of the polygenic score in the target sample.
`r2gx`	Proportion of variance in environmental risk score explained by genetic effects in training sample.
`corgx`	Genetic correlation between environmental risk score and training trait.
`r2xy`	Proportion of variance in training trait explained by environmental risk score.
`adjustedEffects`	TRUE if polygenic and envrionmental scores are combined as a weighted sum. If FALSE, the scores are combined as an unweighted sum even if they are correlated.

The sample size is estimated by numerical optimisation. For each possible sample size, the P-value threshold is identified for selecting markers into the polygenic score, such that targetQuantity is maximised.

A list with the following elements:

n Required sample size for the training sample. This is the total sample size: to obtain the number of cases, multiply by the sampling fraction.
p P-value threshold for selecting markers into the polygenic score, such that the target value is achieved with the minimum sample size.
max Maximum targetQuantity possible if the training sample size were increased to infinity (actually 1e10).

Frank Dudbridge

Dudbridge F (2013) Power and predictive accuracy of polygenic risk scores. PLoS Genet 9:e1003348

# AUC= 0.75 in breast cancer.  See Table 4, row 4, column 3 in Dudbridge (2013).
sampleSizeForGeneScore("AUC",0.75,nsnp=100000,vg1=0.44/2,pi0=0.90,binary=TRUE,
prevalence=0.036,sampling=0.5)
# $n
# [1] 313981.4
#
# $p
# [1] 0.007500909
#
# $max
# [1] 0.788842
#
# Number of cases
sampleSizeForGeneScore("AUC",0.75,nsnp=100000,vg1=0.44/2,pi0=0.90,binary=TRUE,
prevalence=0.036,sampling=0.5)$n/2
# [1] 156990.7