Description Usage Arguments Details Value Author(s) References See Also Examples
Permutation-based p-values estimation via min P test, a gene region-level summary for each candidate gene. The gene region-level summary assesses the smallest p-trend within each gene region comparing cases and controls. The min P test is permutation-based method that can be based on different univariate tests per SNP. Inference is based on the permutation distribution of the ordered p-values from the marginal tests of each SNP. Potentially accelerated by parallelization, if a compute cluster or a multicore computer is available.
1 2 3 4 5 6 7 |
y |
a numeric response vector coded with 0 (coding for controls) and 1 (coding for cases) of length |
x |
a numeric |
SNPtoGene |
a mapping matrix of dimension |
formula |
(optional) for unconditional or conditional logistic regression, respectively, with or without covariates other than SNPs. A symbolic description of the model to be fitted including covariates only, see |
cov |
(optional) a |
matchset |
(optional) a numeric vector of length |
permutation |
number of permutations employed to obtain a null distribution. |
seed |
(optional) vector of length |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
parallel |
indicates whether computation in the permuted data sets should be performed in parallel using package parallel. If TRUE, the parallelization requires at least two cores. A value larger than 1 is taken to be the number of cores. |
ccparallel |
logical value indicates whether computation should be performed in parallel on a compute cluster, using package snowfall. If TRUE the initialization function of this package, |
trace |
logical value indicating whether progress in estimation should be indicated by printing the number of permutation that is currently used. (ignored if running in parallel via snowfall). |
aggregation.fun |
function that is used to combine the trend p-values over multiple loci within a gene region. By default the minimum ( |
adj.method |
correction method for multiple hypothesis testing. By default the Bonferroni method ( |
... |
Further arguments for |
The idea of the gene region-level summary, using the min P test procedure (Westfall and Young, 1993; Westfall et al., 2002; Chen et al., 2006), is to identify candidate genes by assessing the statistical significance of the smallest p-trend from a set of SNPs (single nucleotide polymorphisms) within each gene region comparing cases and controls by permutation-based resampling methods. A SNP occurs when a single nucleotide, (A), (T), (C) or (G), in the genome differs between individuals and, in addition, this variation, substitution of one nucleotide for another, occurs in more than 1% of a population. A SNP can take three possible values (genotypes): either there is no SNP variant in comparison to some reference coding (homozygous reference (0)) or the SNP variant occurs on one of the two base pair positions (heterozygous (1)), or both base pairs have a variant comparing to the reference coding. minPtest permits to include, instead of the genotypes 0, 1 and 2, also combined carrier SNPs, e.g. coding 0 and 1 (1 + 2).
Computation of the min P test is based on the marginal trend p-values for a set of univariate SNP disease association and the trend p-values for the permutation samples for each SNP. The minPtest package brings together three different kinds of tests to compute such p-values that are scattered over several R packages, and automatically selects the one most appropriate for the design at hand. In any case a response vector y
, a SNP matrix x
and a mapping matrix SNPtoGene
are required. Then the default, a Cochran Armitage Trend Test (Cochran, 1954; Armitage, 1955), is automatically fitted to compute p-values. The Cochran Armitage Trend Test does not depend on covariates and matching scenario. Additionally adding a formula, see also glm
from package base, and a covariate matrix cov
an unconditional logistic regression is fitted. Unconditional logistic regression can be used without or with covariates for adjustment; either formula=y~1
or formula=y~cov1+cov2+...
. The former does not need any information relative to covariates and matching scenario. However, the latter is general for frequency matching with the inclusion of matching variables for adjustment specified in the covariate matrix cov
. Providing a matchset, as in the case of 1:1; 1:2 etc. matching, and a formula, see also clogistic
from package Epi, a conditional logistic regression is fitted. Conditional logistic regression can be used without or with covariates for adjustment; either formula=y~1
or formula=y~cov1+cov2...
. In the latter case covariates other than matching variables can be used and have to be specified in the covariate matrix cov
. In general, there are two possibilities to specify the formula, first if no covariates are used for adjustment, the formula has to be written as y~1
without specifying the covariate matrix cov
. Second if covariates other than SNPs are used for adjustment, the formula has to be written as response vector y
on the left of a ~
operator, and the clinical covariates on the right, as well as a covariate matrix has to be specified.
If SNPs genotypes are coded by 0, 1 and 2, they are included as continuous variables in the logistic regression models. If SNPs are coded as carrier SNPs 0 and 1, they are included as binary variables in the logistic regression models. If covariates are used for adjustment, the column names of the covariate matrix cov
have to be specified as used in the formula specification, to link the formula with the covariate matrix cov
.
Missing SNP genotypes in x
or, if used, missing values in cov
are accounted for, as each marginal test makes use of the available data for that SNP in x
and for that covariate in cov
only. The minPtest uses all subjects with available data for each SNP (and covariates) when fitting Cochran Armitage Trend Test or unconditional logistic regression. Note that in conditional logistic regression, the matched subjects are removed together in case of 1:1 matching. In the 1:2 matching scenario, matched subjects are removed when the missing occurs in a case, otherwise when a missing occurs in one control, only that control is removed.
Concerning parallelization on a compute cluster, i.e. with argument ccparallel=TRUE
, there are two possibilities to run minPtest:
Start R on a commandline with sfCluster (Knaus et al., 2009) and preferred options, e.g. number of cpus. The initialization function of package snowfall, sfInit()
, should be called before calling minPtest.
Use any other solutions supported by snowfall. Argument ccparallel
has to be set to TRUE and number of cpus can be chosen in the sfInit()
function.
sfCluster is a Unix tool for convenient management of R parallel processes. It is available at www.imbi.uni-freiburg.de/parallel, with detailed information.
A print function returns a short overviews of the results. The print function describes the number of subjects included in the analysis, which method is used by the package, briefing of the number of genes, the number of SNPs, the number of missings in the SNP matrix x
and the number of permutations used for the fit. A summary.minPtest
and a plot.minPtest
function are available.
An object of class 'minPtest', which is a list containing the following components:
minp |
|
p.adj.minp |
|
psnp |
|
p.adj.psnp |
|
psnpperm |
|
zgen |
|
zgenperm |
|
n |
number of subjects in the original data set. |
nrsnp |
number of SNPs in the original data set. |
nrgene |
number of genes in the original data set. |
snp.miss |
number of missings in the SNP matrix |
n.permute |
number of permutations. |
method |
used method. |
call |
call. |
SNPtoGene |
the mapping matrix of dimension |
Stefanie Hieke hieke@imbi.uni-freiburg.de
Armitage,P. (1955). Tests for linear trends in proportions and frequencies. Biometrics, 11(3), 375-386.
Chen,B.E. et al. (2006). Resampling-based multiple hypothesis testing procedures for genetic case-control association studies. Genetic Epidemiology, 30, 495-507.
Cochran,W.G. (1954). Some methods for strengthening the common chi-squared tests. Biometrics, 10(4), 417-451.
Knaus,J. et al. (2009). Easier parallel computing in R with snowfall and sfCluster. The R Journal, 1, 54-59.
Westfall,P.H. et al.(2002). Multiple tests for genetic effects in association studies. Methods Mol Biol, 184, 143-168.
Westfall,P.H. and Young,S.S. (1993). Resampling-Based Multiple Testing: Example and Methods for p-Value Adjustment. Wiley, New York.
summary.minPtest
, plot.minPtest
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # generate a simulated data set as in the example of the function generateSNPs
# consisting of 100 subjects and 200 SNPs on 5 genes.
SNP <- c(6,26,54,135,156,186)
BETA <- c(0.9,0.7,1.5,0.5,0.6,0.8)
SNPtoBETA <- matrix(c(SNP,BETA),ncol=2,nrow=6)
colnames(SNPtoBETA) <- c("SNP.item","SNP.beta")
set.seed(191)
sim1 <- generateSNPs(n=100,gene.no=5,block.no=4,block.size=10,p.same=0.9,
p.different=0.75,p.minor=c(0.1,0.4,0.1,0.4),n.sample=80,SNPtoBETA=SNPtoBETA)
# Cochran Armitage Trend Test without covariates and default permutations.
# Example: Run R sequential
### Seed
set.seed(10)
seed1 <- sample(1:1e7,size=1000)
###
minPtest.object <- minPtest(y=sim1$y, x=sim1$x, SNPtoGene=sim1$SNPtoGene,
seed=seed1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.