Description Usage Arguments Details Value Author(s) References See Also Examples
Permutationbased pvalues estimation via min P test, a gene regionlevel summary for each candidate gene. The gene regionlevel summary assesses the smallest ptrend within each gene region comparing cases and controls. The min P test is permutationbased method that can be based on different univariate tests per SNP. Inference is based on the permutation distribution of the ordered pvalues from the marginal tests of each SNP. Potentially accelerated by parallelization, if a compute cluster or a multicore computer is available.
1 2 3 4 5 6 7 
y 
a numeric response vector coded with 0 (coding for controls) and 1 (coding for cases) of length 
x 
a numeric 
SNPtoGene 
a mapping matrix of dimension 
formula 
(optional) for unconditional or conditional logistic regression, respectively, with or without covariates other than SNPs. A symbolic description of the model to be fitted including covariates only, see 
cov 
(optional) a 
matchset 
(optional) a numeric vector of length 
permutation 
number of permutations employed to obtain a null distribution. 
seed 
(optional) vector of length 
subset 
an optional vector specifying a subset of observations to be used in the fitting process. 
parallel 
indicates whether computation in the permuted data sets should be performed in parallel using package parallel. If TRUE, the parallelization requires at least two cores. A value larger than 1 is taken to be the number of cores. 
ccparallel 
logical value indicates whether computation should be performed in parallel on a compute cluster, using package snowfall. If TRUE the initialization function of this package, 
trace 
logical value indicating whether progress in estimation should be indicated by printing the number of permutation that is currently used. (ignored if running in parallel via snowfall). 
aggregation.fun 
function that is used to combine the trend pvalues over multiple loci within a gene region. By default the minimum ( 
adj.method 
correction method for multiple hypothesis testing. By default the Bonferroni method ( 
... 
Further arguments for 
The idea of the gene regionlevel summary, using the min P test procedure (Westfall and Young, 1993; Westfall et al., 2002; Chen et al., 2006), is to identify candidate genes by assessing the statistical significance of the smallest ptrend from a set of SNPs (single nucleotide polymorphisms) within each gene region comparing cases and controls by permutationbased resampling methods. A SNP occurs when a single nucleotide, (A), (T), (C) or (G), in the genome differs between individuals and, in addition, this variation, substitution of one nucleotide for another, occurs in more than 1% of a population. A SNP can take three possible values (genotypes): either there is no SNP variant in comparison to some reference coding (homozygous reference (0)) or the SNP variant occurs on one of the two base pair positions (heterozygous (1)), or both base pairs have a variant comparing to the reference coding. minPtest permits to include, instead of the genotypes 0, 1 and 2, also combined carrier SNPs, e.g. coding 0 and 1 (1 + 2).
Computation of the min P test is based on the marginal trend pvalues for a set of univariate SNP disease association and the trend pvalues for the permutation samples for each SNP. The minPtest package brings together three different kinds of tests to compute such pvalues that are scattered over several R packages, and automatically selects the one most appropriate for the design at hand. In any case a response vector y
, a SNP matrix x
and a mapping matrix SNPtoGene
are required. Then the default, a Cochran Armitage Trend Test (Cochran, 1954; Armitage, 1955), is automatically fitted to compute pvalues. The Cochran Armitage Trend Test does not depend on covariates and matching scenario. Additionally adding a formula, see also glm
from package base, and a covariate matrix cov
an unconditional logistic regression is fitted. Unconditional logistic regression can be used without or with covariates for adjustment; either formula=y~1
or formula=y~cov1+cov2+...
. The former does not need any information relative to covariates and matching scenario. However, the latter is general for frequency matching with the inclusion of matching variables for adjustment specified in the covariate matrix cov
. Providing a matchset, as in the case of 1:1; 1:2 etc. matching, and a formula, see also clogistic
from package Epi, a conditional logistic regression is fitted. Conditional logistic regression can be used without or with covariates for adjustment; either formula=y~1
or formula=y~cov1+cov2...
. In the latter case covariates other than matching variables can be used and have to be specified in the covariate matrix cov
. In general, there are two possibilities to specify the formula, first if no covariates are used for adjustment, the formula has to be written as y~1
without specifying the covariate matrix cov
. Second if covariates other than SNPs are used for adjustment, the formula has to be written as response vector y
on the left of a ~
operator, and the clinical covariates on the right, as well as a covariate matrix has to be specified.
If SNPs genotypes are coded by 0, 1 and 2, they are included as continuous variables in the logistic regression models. If SNPs are coded as carrier SNPs 0 and 1, they are included as binary variables in the logistic regression models. If covariates are used for adjustment, the column names of the covariate matrix cov
have to be specified as used in the formula specification, to link the formula with the covariate matrix cov
.
Missing SNP genotypes in x
or, if used, missing values in cov
are accounted for, as each marginal test makes use of the available data for that SNP in x
and for that covariate in cov
only. The minPtest uses all subjects with available data for each SNP (and covariates) when fitting Cochran Armitage Trend Test or unconditional logistic regression. Note that in conditional logistic regression, the matched subjects are removed together in case of 1:1 matching. In the 1:2 matching scenario, matched subjects are removed when the missing occurs in a case, otherwise when a missing occurs in one control, only that control is removed.
Concerning parallelization on a compute cluster, i.e. with argument ccparallel=TRUE
, there are two possibilities to run minPtest:
Start R on a commandline with sfCluster (Knaus et al., 2009) and preferred options, e.g. number of cpus. The initialization function of package snowfall, sfInit()
, should be called before calling minPtest.
Use any other solutions supported by snowfall. Argument ccparallel
has to be set to TRUE and number of cpus can be chosen in the sfInit()
function.
sfCluster is a Unix tool for convenient management of R parallel processes. It is available at www.imbi.unifreiburg.de/parallel, with detailed information.
A print function returns a short overviews of the results. The print function describes the number of subjects included in the analysis, which method is used by the package, briefing of the number of genes, the number of SNPs, the number of missings in the SNP matrix x
and the number of permutations used for the fit. A summary.minPtest
and a plot.minPtest
function are available.
An object of class 'minPtest', which is a list containing the following components:
minp 

p.adj.minp 

psnp 

p.adj.psnp 

psnpperm 

zgen 

zgenperm 

n 
number of subjects in the original data set. 
nrsnp 
number of SNPs in the original data set. 
nrgene 
number of genes in the original data set. 
snp.miss 
number of missings in the SNP matrix 
n.permute 
number of permutations. 
method 
used method. 
call 
call. 
SNPtoGene 
the mapping matrix of dimension 
Stefanie Hieke hieke@imbi.unifreiburg.de
Armitage,P. (1955). Tests for linear trends in proportions and frequencies. Biometrics, 11(3), 375386.
Chen,B.E. et al. (2006). Resamplingbased multiple hypothesis testing procedures for genetic casecontrol association studies. Genetic Epidemiology, 30, 495507.
Cochran,W.G. (1954). Some methods for strengthening the common chisquared tests. Biometrics, 10(4), 417451.
Knaus,J. et al. (2009). Easier parallel computing in R with snowfall and sfCluster. The R Journal, 1, 5459.
Westfall,P.H. et al.(2002). Multiple tests for genetic effects in association studies. Methods Mol Biol, 184, 143168.
Westfall,P.H. and Young,S.S. (1993). ResamplingBased Multiple Testing: Example and Methods for pValue Adjustment. Wiley, New York.
summary.minPtest
, plot.minPtest
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  # generate a simulated data set as in the example of the function generateSNPs
# consisting of 100 subjects and 200 SNPs on 5 genes.
SNP < c(6,26,54,135,156,186)
BETA < c(0.9,0.7,1.5,0.5,0.6,0.8)
SNPtoBETA < matrix(c(SNP,BETA),ncol=2,nrow=6)
colnames(SNPtoBETA) < c("SNP.item","SNP.beta")
set.seed(191)
sim1 < generateSNPs(n=100,gene.no=5,block.no=4,block.size=10,p.same=0.9,
p.different=0.75,p.minor=c(0.1,0.4,0.1,0.4),n.sample=80,SNPtoBETA=SNPtoBETA)
# Cochran Armitage Trend Test without covariates and default permutations.
# Example: Run R sequential
### Seed
set.seed(10)
seed1 < sample(1:1e7,size=1000)
###
minPtest.object < minPtest(y=sim1$y, x=sim1$x, SNPtoGene=sim1$SNPtoGene,
seed=seed1)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.