nullModel | R Documentation |
Method for creating a null model that can be used for
association testing using assocTest
## S4 method for signature 'formula,data.frame'
nullModel(X, y, data,
type=c("automatic", "logistic", "linear", "bernoulli"),
n.resampling=0,
type.resampling=c("bootstrap", "permutation"),
adj=c("automatic", "none", "force"), adjExact=FALSE,
n.resampling.adj=10000, checkData=TRUE)
## S4 method for signature 'formula,missing'
nullModel(X, y, data,
type=c("automatic", "logistic", "linear", "bernoulli"),
n.resampling=0,
type.resampling=c("bootstrap", "permutation"),
adj=c("automatic", "none", "force"), adjExact=FALSE,
n.resampling.adj=10000, checkData=TRUE)
## S4 method for signature 'matrix,numeric'
nullModel(X, y,
type=c("automatic", "logistic", "linear"), ...)
## S4 method for signature 'matrix,factor'
nullModel(X, y,
type=c("automatic", "logistic", "linear"), ...)
## S4 method for signature 'missing,numeric'
nullModel(X, y,
type=c("automatic", "logistic", "linear", "bernoulli"),
...)
## S4 method for signature 'missing,factor'
nullModel(X, y,
type=c("automatic", "logistic", "linear", "bernoulli"),
...)
X |
a formula or matrix |
y |
if the formula interface is used, |
data |
for consistency with standard R methods from the
stats package, the data frame can also be passed to
|
type |
type of model to train (see details below) |
n.resampling |
number of null model residuals to sample; set to zero (default) to turn resampling off; resampling is not supported for plain trait vectors without covariates |
type.resampling |
method how to sample null model residuals; the choice “permutation” refers to simple random permutations of the model's residuals. If “bootstrap” is chosen (default), the following strategy is applied for linear models (continuous trait): residuals are sampled as normally distributed values with mean 0 and the same standard deviation as the model's residuals. For logistic models (binary trait), the choice “bootstrap” selects the same bootstrapping method that is implemented in the SKAT package. |
adj |
whether or not to use small sample correction for logistic models (binary trait with covariates). The choice “none” turns off small sample correction. If “force” is chosen, small sample correction is turned on unconditionally. If “automatic” is chosen (default), small sample correction is turned on if the number of samples does not exceed 2,000. This argument is ignored for any type of model except “logistic” and small sample correction is switched off. |
adjExact |
in case small sample correction is switched on (see
above), this argument indicates whether or not the exact
square root of the matrix |
n.resampling.adj |
number of null model residuals to sample for the adjustment of higher moments; ignored if small sample correction is switched off. |
checkData |
if |
... |
all other parameters are passed on to the |
The podkat package assumes a mixed model in which the
trait under investigation depends both on covariates (if
any) and the genotype. The nullModel
method models the relationship
between the trait and the covariates (if any) without taking the genotype
into account, which corresponds to the null assumption that the
trait and the genotype are independent. Therefore, we speak of
null models.
The following types of models are presently
available:
a linear
model is trained for a continuous trait and a
given set of covariates (if any); this is done by standard
linear regression using the lm
function.
a generalized linear
model is trained for a binary trait and a
given set of covariates (if any); this is done by
logistic regression using the glm
function.
a binary
trait without covariates is interpreted as instances of
a simple Bernoulli process with p
being the relative
frequencies 1's/cases.
The type
argument can be used to select the type of model,
where the following restrictions apply:
For linear models, the trait vector must be numerical. Factors/factor columns are not accepted.
For logistic models and Bernoulli-distributed traits, both numerical
vectors and factors are acceptable. In any case, only 0's (controls)
and 1's (cases) are accepted. Furthermore, nullModel
quits
with an error if the trait shows no variation. In other words,
trait vectors that only contain 0's or only contain 1's are not
accepted (as association testings makes little sense for such
traits anyway).
The following interfaces are available to specify the traits and the covariates (if any):
the first argument X
can be a formula that specifies the trait vector/column, the
covariate matrix/columns (if any), and the intercept (if any).
If neither the y
argument nor the data
argument
is specified, nullModel
searches the environment from
which the function has been called. This interface is largely
analogous to the functions lm
and
glm
.
if the X
argument
is omitted and y
is a numeric vector or factor, y
is
interpreted as trait vector, and a null model is created from
y
without covariates. Linear and logistic models are
trained with an intercept. For type “bernoulli”, the
trait vector is written to the output object as is.
if the X
argument
is a matrix and y
is a numeric vector or factor, y
is
interpreted as trait vector and X
is interpreted as
covariate matrix. In this case, linear and logistic models are
trained as (generalized) linear regressors that predict the
trait from the covariates plus an intercept. The type
“bernoulli” is not available for this variant, since
this type of model cannot consider covariates.
All nullModel
methods also support the choice
type="automatic"
. In this case, nullModel
guesses the
most reasonable type of model in the following way: If the trait
vector/column is a factor or a numeric vector containing only 0's
and 1's (where both values must be present, as
noted above already), the trait is supposed to be binary and
the type “logistic” is assumed, unless the following conditions
are satisfied:
The number of samples does not exceed 100.
No intercept and no covariates
have been specified. This condition can be met by supplying an empty model
to the formula interface (e.g. y ~ 0
) or by supplying the
trait vector as argument y
while omitting X
.
If these two conditions are fulfilled for a binary trait,
nullModel
chooses the type “bernoulli”. If the trait is
not binary and the trait vector/column is numeric,
nullModel
assumes type “linear”.
For consistency with the SKAT package, the podkat package
also offers resampling, i.e. a certain number of vectors of
residuals are sampled according to the null model. This can be done
when training the null model by setting the n.resampling
parameter (number of residual vectors that are sampled) to a value
larger than 0. Then, when association testing is performed, p-values
are computed also for all these sampled residuals, and an additional
estimated p-value is computed as the relative frequency of
p-values of sampled residuals that are at least as significant as
the test's p-value. The procedure to sample residuals is controlled
with the type.resampling
argument (see above).
For logistic models (type “logistic”), assocTest
offers the small sample correction as introduced by
Lee et al. (2012). If the adjustment of higher moments should
be applied, some preparations need to be
made already when training the null model. Which preparations are
carried out, can be controlled by the arguments adj
,
adjExact
, n.resampling.adj
, and type.resampling
(see descriptions of arguments above and Subsection 9.5 of the package
vignette).
If any missing values are found in the trait vector/column or the
covariate matrix/columns, the respective samples are omitted from the
resulting model (which is the standard behavior of
lm
and glm
anyway).
The indices of the omitted samples are stored in the na.omit
slot of the returned NullModel
object.
returns a NullModel
object
Ulrich Bodenhofer
https://github.com/UBod/podkat
Lee, S., Emond, M. J., Bamshad, M. J., Barnes, K. C., Rieder, M. J., Nickerson, D. A., NHLBI Exome Sequencing Project - ESP Lung Project Team, Christiani, D. C., Wurfel, M. M., and Lin, X. (2012) Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91, 224-237. DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.ajhg.2012.06.007")}.
NullModel
, lm
,
glm
## read phenotype data from CSV file (continuous trait + covariates)
phenoFile <- system.file("examples/example1lin.csv", package="podkat")
pheno <-read.table(phenoFile, header=TRUE, sep=",")
## train null model with all covariates in data frame 'pheno'
model <- nullModel(y ~ ., pheno)
model
length(model)
residuals(model)
## read phenotype data from CSV file (binary trait + covariates)
phenoFile <- system.file("examples/example1log.csv", package="podkat")
pheno <-read.table(phenoFile, header=TRUE, sep=",")
## train null model with all covariates in data frame 'pheno'
model <- nullModel(y ~ ., pheno)
model
length(model)
residuals(model)
## "train" simple Bernoulli model on a subset of 100 samples
model <- nullModel(y ~ 0, pheno[1:100, ])
model
length(model)
residuals(model)
## alternatively, use the interface that only supplies the
## trait vector
model <- nullModel(y=pheno[1:100, ]$y)
model
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.