mbmdr: Model Based Multifactor Dimensionality Reduction
In mbmdr: Model Based Multifactor Dimensionality Reduction

Description Usage Arguments Details Value Author(s) References Examples

View source: R/mbmdr.R

mbmdr implements the Model Based Multifactor Dimensionality Reduction (MB-MDR) method proposed by Calle et al.(2008) as a dimension reduction method for exploring gene-gene interactions.

  mbmdr(y, data, order, covar=NULL, exclude=NA, risk.threshold=0.1, 
        output=NULL, adjust=c("none","covariates","main effects","both"), 
        first.model=NULL, list.models=NULL, use.logistf=TRUE, 
        printStep1=FALSE, ...)

`y`	Vector containing the dependent variable.
`data`	A data.frame (or object coercible by as.data.frame to a data frame) containing the SNP information with values 0,1,2. For example: 0 = common homozygous genotype, 1 = heterozygous genotype, 2 = variant homozygous genotype.
`order`	Single integer that specifies the order of interactions to be analyzed. If `list.models = NULL` (value by default) all possible interactions of the specified order are analyzed.
`covar`	(Optional) A data.frame or object coercible by as.data.frame to a data frame containing the covariates for adjusting regression models. Only used if `adjust="covariates"` or `adjust="both"`.
`exclude`	(Optional) Value/s of missing data. If missings in data are coded differently than `NA` it should be specified. For example `exclude=c(NA,-1)` specifies that both, `NA` and `-1` indicate a missing value.
`risk.threshold`	Threshold used at the first MB-MDR stage for defining the risk category of a multilocus genotype. It should be a conservative value. The default value is `risk.threshold=0.1`.
`output`	(Optional) Output file name for storing `mbmdr` results on file, or `NULL` (default) for output as R object. If the number of models to be analyzed is too large, it is preferable to store the output in a file. This allows exploring partial results while mbmdr is still running and prevents from loosing all the information if R or `mbmdr` crashes during the process.
`adjust`	Type of regressions adjustment. Options are `"none", "covariates", "main effects"` or `"both"`. By default no adjustment is performed; `adjust="none"`.
`first.model`	(Optional) Numerical vector of length equal to `order` for specifying the initial interaction model for `mbmdr`; previous models will not be evaluated. This is useful to continue `mbmdr` computation after a stop. Note that, by default, `mbmdr` explores all possible interactions of a specified order. If there are for example, 50 snps in data and `order=3`, `mbmdr` will start analyzing the model (50,49,48), that means interaction between snps 50, 49 and 48 (column position in data data.frame). The second model that `mbmdr` will analyze is (50,49,47). After model (50,49,1), the next model will be (50,48,47), and the final model will be (3,2,1). For example, if `mbmdr` stopped after the analysis of model (30,21,14), you can continue the process by specifying `first.model=c(30,21,13)`. Ids of snps must be in descended order. If `first.model=NULL` (by default) all models will be analyzed.
`list.models`	(Optional) Exhaustive list of models to be analyzed. Only models in list will be analyzed. It can be: a vector of lenght `order` specifying a unique model, a matrix (`n x order`) containing models by rows, or a string for specifying a file with models by rows (all models must be of the same interaction order) A `NULL` value (by default) indicates that all possible interactions will be analyzed.
`use.logistf`	Boolean value indicating whether or not to use the `logistf` package for logistic regressions when separation of data points is observed. It only has effect if logistic regression (`family=binomial(link = "logit")`) is specified. (See `logistf` help for details). The default value is `TRUE`.
`printStep1`	Boolean value. If true, the details of `mbmdr` step 1 are printed for every model. This slows the process, so it is only advisable when the number of models to analyze is small. By default `printStep1=FALSE`.
`...`	For regression arguments: arguments to be passed to `glm` calls. Mainly to specify the error distribution and link function to be used in the regression models. For example, use `family=binomial(link=logit)` for specifying logistic regression or `gaussian(link = "identity")` for normal regression. (See `family` for details of family functions and `glm` for more options of `glm` function).

MB-MDR is a method for identifying multi-locus genotypes that are associated with a phenotype of interest, and allows to adjust for marginal and confounding effects.

The exploration of interacions is performed in three steps:

Step1
Each genotype is tested for association with the response and is classified as high risk, low risk or not significant, and all genotypes of the same class are merged. The threshold for considering significant evidence is the value specified in risk.threshold (by default risk.threshold=0.1).
If printStep1=TRUE, the MBMDR function prints this classification.

Step2
For each risk categories, high and low, a new association test is performed. The result provides a Wald statistic for the high and for the low categories.

Step3
The significance is explored through a permutation test on the maximum Wald statistics.

mbmdr returns an object of class mbmdr with the following attributes:

call

The matched call.

y

The outcome used.

data

The SNPs data used.

covar

The covariate data used.

result

Dataframe with those interactions that have at least a significant genotype. For each interaction (rows), the following information is returned:

`SNP1...SNPx`		Names of snps in interaction.
`NH`		Number of significant High risk genotypes in the interaction.
`betaH`		Regresion coeficient in step2 for High risk exposition.
`WH`		Wald statistic for High risk category.
`PH`		P-value of the Wald test for the High risk category.
`NL`		Number of significant Low risk genotypes in the interaction.
`betaL`		Regresion coeficient in step2 for Low risk exposition.
`WL`		Wald statistic for Low risk category.
`PL`		P-value of the Wald test for the Low risk category.
`MIN.P`		Minimun p-value (`min(PH,PL)`) for the interaction model.

If printStep1 argument is set to TRUE, the result of the first step in mbmdr is printed for each genotype with the following information:

`...`		Genotype.
`cases`		(only for case/control outcome) Number of cases with the specific genotype.
`controls`		(only for case/control outcome) Number of controls with the specific genotype.
`beta`		Regression coefficient for this genotype.
`p.value`		Wald test p-value for this genotype.
`category`		Predicted risk category for this genotype.

Victor Urrea, Malu Calle, Kristel Van Steen, Nuria Malats

Calle M.L., Urrea V., Vellalta G., Malats N., Steen K.V. (2008) Improving strategies for detecting genetic patterns of disease susceptibility in association studies. Statistics in Medicine 27, 6532-6546.

#---  Dicotomous outcome  -------------

# load example data
data(simSNP)

# MB-MDR analysis of all possible 2nd order interactions (it may take some time)
# The order of the interactions to be explored is specified by order=2
# fit <- mbmdr(y=simSNP$Y,data=simSNP[,3:12],order=2,family=binomial(link=logit))
# print(fit)

# MB-MDR analysis of the epistatic effect of SNP1 and SNP2 (Model 2 1)
# The specific model to be analyzed is specified by list.models=c(2,1)
fit <- mbmdr(y=simSNP$Y,data=simSNP[,3:12],order=2,list.models=c(2,1),
             family=binomial(link=logit),printStep1=TRUE)
print(fit)


# MB-MDR analysis of the epistatic effect of SNP1 and SNP2, adjusted for variable X
# The specific model to be analyzed is specified by list.models=c(2,1)
# The adjustment statement is specified by adjust="covariates"
fit <- mbmdr(y=simSNP$Y,data=simSNP[,3:12],order=2,list.models=c(2,1),
             covar=simSNP$X,adjust="covariates",family=binomial(link=logit))
print(fit)


# MB-MDR analysis of all 2nd order interactions restricted to a subset of snps 
# (all possible 2nd order interactions among SNP1, SNP2 and SNP3 are explored,
# it may take some time)
# SNP1, SNP2 and SNP3 are placed in columns 3 to 5 of simSNP. This is specified
# by data=simSNP[,3:5]
# fit <- mbmdr(y=simSNP$Y,data=simSNP[,3:5],order=2,family=binomial(link=logit),
#             printStep1=TRUE)
# print(fit)


# MB-MDR analysis of all possible 3rd order interactions (it may take some time)
# The order of the interactions to be explored is specified by order=3
# fit <- mbmdr(y=simSNP$Y,data=simSNP[,3:12],order=3,family=binomial(link=logit))
# print(fit)


# MB-MDR analysis of the 3rd order epistatic effect of SNP1, SNP2 and SNP3
# The specific model to be analyzed is specified by list.models=c(3,2,1)
# fit <- mbmdr(y=simSNP$Y,data=simSNP[,3:12],order=3,list.models=c(3,2,1),
#              family=binomial(link=logit),printStep1=TRUE)
# print(fit)


#---  Continuous outcome  --------------
# load example data
data(simSNPcont)

# MB-MDR analysis of all possible 2nd order interactions (it may take some time)
# The order of the interactions to be explored is specified by order=2
# fit <- mbmdr(y=simSNPcont$Y,data=simSNPcont[,2:11],order=2)
# print(fit)

# MB-MDR analysis of the epistatic effect of SNP1 and SNP2 (Model 2 1)
# The specific model to be analyzed is specified by list.models=c(2,1)
fit <- mbmdr(y=simSNPcont$Y,data=simSNPcont[,2:11],order=2,
	           list.models=c(2,1),printStep1=TRUE)
print(fit)