Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/chisq.sim.factory.R

Produce a function that returns a simulator of chi-square statistics and odds ratios, determine if a set of parameters is legal, return the non-centrality parameter, lambda, of the chi-square distribution under a set of parameters, take a derivative of lambda with respect to number of silver-standard samples under a set of parameters.

1 2 3 4 5 6 7 8 | ```
chisq.sim.factory(R, gam_ca, gam_co, ppv, npv,
homRR, N_co, maf, prev, model)
is.legal(R, gam_ca, gam_co, ppv, npv,
homRR, N_co, maf, prev, model)
chisq.lytic.func(R, gam_ca, gam_co, ppv, npv,
homRR, N_co, maf, prev, model)
dlambda(R, gam_ca, gam_co, ppv, npv,
homRR, N_co, maf, prev, model, diff="gam_ca")
``` |

`R` |
Ratio of gold standard controls to gold standard cases |

`gam_ca` |
Ratio of silver standard cases to gold standard cases |

`gam_co` |
Ratio of silver standard controls to gold standard controls |

`ppv` |
The positive predictive value of the criteria used to identify silver standard cases, ie, P(Affected | Case) |

`npv` |
The negative predictive value of the criteria used to identify silver standard controls, ie, P(Unaffected | Control) |

`homRR` |
The relative risk of a homozygous genotype, ie, P(Affected | geno=AA) / P(Affected | geno=aa) |

`N_co` |
The number of gold-standard controls |

`maf` |
The minor allele frequency of the risk locus |

`prev` |
Disease prevalence, ie, P(Affected) |

`model` |
Disease risk model, either one of ‘dominant’, ‘recessive’, ‘multiplicative’ or number giving the heterozygous relative risk. |

`diff` |
Take the derivative of lambda with respect to ‘gam_ca’ (default) or ‘gam_co’ |

This function simulates the behavior of chi-square tests for independence in a genome-wide association study consisting of two cohorts. The first cohort, the gold standard cohort, is assumed to have been classified into affected and unaffected without error. The second cohort, the silver standard cohort, is assumed to have errors in disease classification subject to the `npv`

and `ppv`

arguments. A distinction is drawn between affected, unaffected and case and control. Affected and unaffected are the true disease status, which is observed in the gold standard cohort. In the silver standard cohort the case/control criteria are observed instead, while the affected/unaffected status is latent.

The numbers of gold and silver standard cases and controls are set by `N_co`

and the ratios `R`

, `gam_co`

and `gam_ca`

. The genotype frequencies in the gold standard cohort are governed by a genotypic disease risk model and the parameters `homRR`

, `maf`

, `prev`

and `model`

. The `model`

argument may be a character vector of length one, either ‘dominant’, ‘recessive’, ‘multiplicative’ (or an unambigous abbreviation thereof), which links the heterozygous relative risk P(Affected | Aa) / P(Affected | aa) to `homRR`

, or it may be a floating point value directly specifying the heterozygous relative risk.

These functions offer a way to simulate power as well as calculate it asymptotically. The distribution of chi-square statistics under the disease risk model, ppv, npv and case/control ratios is asymptotically distributed non-central chi-square. `chisq.lytic.func`

returns the non-centrality parameter associated with the model. `dlambda`

returns the first derivative of lambda. Since power is increasing in lambda, if `dlambda`

is positive, then an additional silver standard case (if `diff="gam_ca"`

) increases power.

`chisq.sim.factory`

returns a argument-less function that may be repeatedly called to sample from the simulated distribution. This function returns a vector of length 2, consisting of the chi-square statistic and the point estimate of the allelic odds ratio. Parameters are not checked for legality.

`is.legal`

returns a boolean value indicating if the disease risk model parameters are legal, ie, induce genotype probabilities on [0,1], conditional on affected/unaffected status.

`chisq.lytic.func`

returns lambda, the non-centrality parameter of the asymptotic sampling distribution of the chi-square test under the model.

`dlambda`

returns the first derivative of lambda with respect to `gam_ca`

or `gam_co`

Andrew McDavid amcdavidatfhcrcdotorg

McDavid A, Crane PK, Newton KM, Crosslin DR, et al. (2011) The Potential to Enhance Power of Genetic Association Studies through the Use of Silver Standard Cases Derived from Electronic Medical Records: Single Nucleotide Polymorphism Associations with Dementia in the Electronic Medical Records and Genomics (eMERGE) Network

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | ```
##Make a chisq simulator under a study scenario
sim = chisq.sim.factory(R = 4, gam_ca = 3, gam_co = 0,
ppv = .8, npv = 1, homRR = 2.2, N_co = 1000,
maf = .1, prev = .01, model = "mult")
##Run one realization of the simulation
sim()
##Run 100 realizations of the simulation
times = 100
chisq_or = as.data.frame(t(replicate(times, sim())))
##Find the number of times chisq_or$stat.X2[i] exceeded .01 significance
sig = .01
critval = qchisq(1-sig, 2)
nsucess = sum(chisq_or$stat > critval)
##Find the power
nsucess/times
##Compare to asymptotic
lambda = chisq.lytic.func(R = 4, gam_ca = 3, gam_co = 0,
ppv = .8, npv = 1, homRR = 2.2, N_co = 1000,
maf = .1, prev = .01, model = "mult")
1-pchisq(qchisq(1-sig, 2), 2, ncp=lambda)
## generate a multifactorial design
paramset = list(R = c(.5, 1, 2, 4), gam_ca = c(0, 1), gam_co = 0,
ppv = c(.4, .6, .8), npv = .8, homRR = c(1.4, 3, 9), N_co = 1000,
maf = .1, prev = .01, model = "mult")
paramgrid = do.call(expand.grid, c(paramset, stringsAsFactors=FALSE))
##call chisq.lytic and dlambda for each experiment
lambda = vector()
dl = vector()
for( i in 1:nrow(paramgrid)){
lambda[i] = do.call(chisq.lytic.func, paramgrid[i,])
dl[i] = do.call(dlambda, paramgrid[i,])
}
##bind it to the parameter data.frame
##calculate the difference in lambda for gam_ca=0 vs gam_ca=1
paramgrid = cbind(paramgrid, lambda, dl)
param0 = subset(paramgrid, gam_ca==0)
param1 = subset(paramgrid, gam_ca==1)
all(paste(param0[,c(1, 3:10)])==paste(param1[,c(1, 3:10)]))
param0 = cbind(param0, finite_diff_dl = param1$lambda - param0$lambda)
##do they agree?
with(param0, cor(dl, finite_diff_dl))
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.