CEP_Theoretical_Power: Theoretical Power Calculation for CEP-SKAT

Description Usage Arguments Details Value Author(s) Examples

View source: R/CEP_Theoretical_Power.R

Description

Given haplotype data, this function will provide theoretical power calculations over a variety of user-specified settings.

Usage

1
CEP_Theoretical_Power(Haplotypes = NULL, SNP.location = NULL, Region.length = 3000, nruns = 200, weights.beta = c(1, 25), n = 500, tails = 0.2, MAF.cutoff = 0.03, prop.caus = 0.2, effect.size = 0.2, prop.pos = 1, alpha.level = 10^(-6), optimal = FALSE)

Arguments

Haplotypes

N.avail by P matrix, where N.avail=# of haplotypes and P=# of mutations.

SNP.location

Vector of base pair locations for Haplotypes (assumed to be on the same chromsome). Vector is of length P.

Region.length

Length (in base pairs) of region of interest.

nruns

Number of random regions selected. Estimated power is averaged over all nruns regions, so increased nruns improves the accuracy of the power estimate (but leads to longer computation)

weights.beta

The parameters of the beta distribution which will be used as weights in the K matrix. Weights will be generated using the Beta distribution function with these parameters (default is alpha=1, beta=25) evaluated at the MAF of the variant.

n

Sample size.

tails

This should be a number in the range (0,0.5) that specifies the proportion of the study population that will be selected from each tail of the phenotype distribution to be in the sample of size n.

MAF.cutoff

Variants with MAF > MAF.cutoff are not ever considered to be causal.

prop.caus

The proportion of variants that will be considered causal.

effect.size

This tuning parameter controls the effect size of all causal variants. The effect sizes are calculated as -effect.size*log10(MAF).

prop.pos

The proportion of causal variants that have a positive effect on phenotype. The other causal variants are assumed to have a negative effect.

alpha.level

The significance level of the test. Defaults to 10^-6 to mirror the scale of genome-wide association studies.

optimal

A TRUE value will use optimal SKAT. The default FALSE value will not use regular SKAT without the optimality adjustment.

Details

This function requires haplotype data as well as the SNP base pair location for each SNP.

Value

This function returns a single estimate for power under the conditions given by the parameters.

Author(s)

Ian Barnett

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
N.avail = 10000
P = 1000
SNP.loc = 1:P
Haplotypes = matrix(rbinom(N.avail*P,1,.003),nrow=N.avail)
CEP_Theoretical_Power(Haplotypes,SNP.loc,Region.length=50,nruns=100,prop.caus=0.1,prop.pos=0.8,n=500,weights.beta=c(1,25),tails=0.25,effect.size=0.3,alpha.level = 10^(-6),optimal=TRUE)

## The function is currently defined as
function (Haplotypes = NULL, SNP.location = NULL, Region.length = 3000, 
    nruns = 200, weights.beta = c(1, 25), n = 500, tails = 0.2, 
    MAF.cutoff = 0.03, prop.caus = 0.2, effect.size = 0.2, prop.pos = 1, 
    alpha.level = 10^(-6), optimal = FALSE) 
{
    P = ncol(Haplotypes)
    N.avail = nrow(Haplotypes)
    N = as.integer(n/(2 * tails))
    MAF.full = colMeans(Haplotypes)
    if (length(which(MAF.full > 0.5)) != 0) {
        Haplotypes = Make_All_Minor_Alleles(Haplotypes, genotypes = FALSE, 
            MAF.full)
        MAF.full = colMeans(Haplotypes)
    }
    if (optimal) {
        rho = prop.caus^2 * (2 * prop.pos - 1)^2
    }
    power = rep(0, nruns)
    for (i in 1:nruns) {
        Hap.ind = Sample_Haplotypes(SNP.location, Region.length, 
            n)
        p = length(Hap.ind)
        H1 = sample(1:N.avail, N, replace = TRUE)
        H2 = sample(1:N.avail, N, replace = TRUE)
        G = as.matrix(Haplotypes[H1, Hap.ind] + Haplotypes[H2, 
            Hap.ind])
        MAF = colMeans(G)/2
        if (length(which(MAF > 0.5)) != 0) {
            G = Make_All_Minor_Alleles(G, genotype = TRUE, MAF)
            MAF = colMeans(G)/2
        }
        Beta = Get_Betas(MAF, MAF.cutoff, prop.caus, effect.size, 
            prop.pos)
        Mu = G %*% Beta
        Y = Mu + rnorm(nrow(G))
        cuts = quantile(Y, c(tails, 1 - tails))
        probSelect = pnorm(cuts[1], mean = Mu) + pnorm(cuts[2], 
            mean = Mu, lower.tail = FALSE)
        probSelect = probSelect * n/sum(probSelect)
        MAF = colSums(as.numeric(probSelect) * G)/sum(probSelect)
        W = Get_W_Beta_Dist(MAF, weights.beta)
        if (optimal) {
            W = W %*% (diag(1 - rho, nrow(W)) + matrix(rho, nrow = nrow(W), 
                ncol = nrow(W)))
        }
        G = as.numeric(sqrt(probSelect)) * G
        Mu = G %*% Beta
        tG.Mu = t(G) %*% Mu
        A = t(G) %*% G %*% W
        B = tG.Mu %*% t(tG.Mu) %*% W
        A.2 = A %*% A
        A.3 = A.2 %*% A
        A.4 = A.3 %*% A
        c.N = rep(0, 4)
        c.A = rep(0, 4)
        c.N[1] = sum(diag(A))
        c.N[2] = sum(diag(A.2))
        c.N[3] = sum(diag(A.3))
        c.N[4] = sum(diag(A.4))
        c.A[1] = sum(diag(B)) + c.N[1]
        c.A[2] = sum(diag(A %*% B)) + c.N[2]
        c.A[3] = sum(diag(A.2 %*% B)) + c.N[3]
        c.A[4] = sum(diag(A.3 %*% B)) + c.N[4]
        mu.Q = c.N[1]
        sigma.Q = sqrt(2 * c.N[2])
        if (is.na(sigma.Q)) {
            if (i == 1) {
                power[i] = 0
            }
            else {
                power[i] = power[i - 1]
            }
            next
        }
        s1 = c.N[3]/(c.N[2]^(3/2))
        s2 = c.N[4]/(c.N[2]^2)
        if (s1^2 > s2) {
            a = 1/(s1 - sqrt(s1^2 - s2))
            delta = s1 * (a^3) - a^2
        }
        else {
            a = 1/sqrt(s2)
            delta = 0
        }
        l = a^2 - 2 * delta
        mu.X = l + delta
        sigma.X = sqrt(2) * sqrt(l + 2 * delta)
        q.c = (qchisq(1 - alpha.level, df = l, ncp = delta) - 
            mu.X) * sigma.Q/sigma.X + mu.Q
        mu.Q = c.A[1]
        sigma.Q = sqrt(2 * c.A[2])
        if (is.na(sigma.Q)) {
            if (i == 1) {
                power[i] = 0
            }
            else {
                power[i] = power[i - 1]
            }
            next
        }
        s1 = c.A[3]/(c.A[2]^(3/2))
        s2 = c.A[4]/(c.A[2]^2)
        if (s1^2 > s2) {
            a = 1/(s1 - sqrt(s1^2 - s2))
            delta = s1 * (a^3) - a^2
        }
        else {
            a = 1/sqrt(s2)
            delta = 0
        }
        l = a^2 - 2 * delta
        mu.X = l + delta
        sigma.X = sqrt(2) * sqrt(l + 2 * delta)
        power[i] = pchisq(sigma.X * (q.c - mu.Q)/sigma.Q + mu.X, 
            df = max(l, 0), ncp = max(delta, 0), lower.tail = FALSE)
        if (floor(i/10) * 10 == i) {
            msg <- sprintf("%d/%d", i, nruns)
            print(msg)
        }
    }
    return(mean(power))
  }

lin-lab/CEPSKAT documentation built on May 29, 2019, 3:41 a.m.