mdr: Function to perform MDR on a dataset for a given set of loci
In MDR: Detect gene-gene interactions using multifactor dimensionality reduction

Description Usage Arguments Details Value Warning Note Author(s) References See Also Examples

Determines the top x MDR models over a specified set of combinations of loci which minimize balanced accuracy (mean of sensitivity and specificity). Ideally, should be used in conjunction with an internal validation method, such as cross-validation (mdr.cv) or a three-way split (mdr.3WS).

1	mdr(split, comb, x, ratio, equal = "HR", genotype = c(0, 1, 2))

`split`	the dataset; an n by (p+1) matrix where the first column is the binary response vector (coded 0 or 1) and the remaining columns are the p SNP genotypes (coded numerically)
`comb`	a matrix of SNP combinations to consider; the rows represent a given combination and the columns represent the SNP number; to consider k-way interactions, `comb` should have k columns.
`x`	the number of "best" combinations to retain
`ratio`	the case/control ratio threshold to ascribe high-risk/low-risk status of a genotype combination
`equal`	how to treat genotype combinations with case/control ratio equal to the threshold; default is "HR" for high-risk, but can also consider "LR" for low-risk
`genotype`	a numeric vector of possible genotypes arising in `split`; default is c(0,1,2), but this vector can be longer or shorter depending on if more or fewer than three genotypes are possible

MDR is a non-parametric data-mining approach to variable selection designed to detect gene-gene or gene-environment interactions in case-control studies. This function uses balanced accuracy as the evaluation measure to rank potential models.

a list with the MDR model fit containing:

`models`	a matrix of the "best" `x` combinations of loci from `comb`; each row represents a 'model'
`balanced accuracy`	a vector of balanced accuracies for each of the ‘best models’
`high-risk/low-risk`	a matrix of the high-risk/low-risk parameterizations of the genotype combinations for each of the ‘best models’; each row represents a 'model' and the associated vector is an indicator of high-risk status for each genotype combination.

...

MDR is a combinatorial search approach, so considering high-order interactions can be computationally expensive.

When determining the high-risk/low-risk status of a genotype combination, the order of combinations uses the convention that the genotypes of the first locus vary the most, based on the function expand.grid. For instance, with 3 genotypes (0,1,2), a two-way interaction results in the following 9 combinations: (0,0), (1,0), (2,0), (0,1), (1,1), (2,1), (0,2), (1,2), (2,2).

Stacey Winham

Ritchie et al (2001). Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hm Genet 69, 138-147.

Velez et al (2007). A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 31, 306-315.

mdr.cv,mdr.3WS

#load test data
data(mdr1)

#define matrix of all two-way combinations of 15 SNPs; this 105 by 2 matrix defines the 105 combinations of two-way interactions to consider 
loci<-t(combn(15,2)) 

#this runs mdr on the sample data, considering the two-way combinations in 'loci', saving the top 5 models, and defining the threshold as 1 since the data is balanced
fit<-mdr(mdr1,loci,x=5,ratio=1) 

print(fit) #view the fitted mdr object