mdr: Function to perform MDR on a dataset for a given set of loci

Description Usage Arguments Details Value Warning Note Author(s) References See Also Examples

View source: R/mdr.r

Description

Determines the top x MDR models over a specified set of combinations of loci which minimize balanced accuracy (mean of sensitivity and specificity). Ideally, should be used in conjunction with an internal validation method, such as cross-validation (mdr.cv) or a three-way split (mdr.3WS).

Usage

1
mdr(split, comb, x, ratio, equal = "HR", genotype = c(0, 1, 2))

Arguments

split

the dataset; an n by (p+1) matrix where the first column is the binary response vector (coded 0 or 1) and the remaining columns are the p SNP genotypes (coded numerically)

comb

a matrix of SNP combinations to consider; the rows represent a given combination and the columns represent the SNP number; to consider k-way interactions, comb should have k columns.

x

the number of "best" combinations to retain

ratio

the case/control ratio threshold to ascribe high-risk/low-risk status of a genotype combination

equal

how to treat genotype combinations with case/control ratio equal to the threshold; default is "HR" for high-risk, but can also consider "LR" for low-risk

genotype

a numeric vector of possible genotypes arising in split; default is c(0,1,2), but this vector can be longer or shorter depending on if more or fewer than three genotypes are possible

Details

MDR is a non-parametric data-mining approach to variable selection designed to detect gene-gene or gene-environment interactions in case-control studies. This function uses balanced accuracy as the evaluation measure to rank potential models.

Value

a list with the MDR model fit containing:

models

a matrix of the "best" x combinations of loci from comb; each row represents a 'model'

balanced accuracy

a vector of balanced accuracies for each of the ‘best models’

high-risk/low-risk

a matrix of the high-risk/low-risk parameterizations of the genotype combinations for each of the ‘best models’; each row represents a 'model' and the associated vector is an indicator of high-risk status for each genotype combination.

...

Warning

MDR is a combinatorial search approach, so considering high-order interactions can be computationally expensive.

Note

When determining the high-risk/low-risk status of a genotype combination, the order of combinations uses the convention that the genotypes of the first locus vary the most, based on the function expand.grid. For instance, with 3 genotypes (0,1,2), a two-way interaction results in the following 9 combinations: (0,0), (1,0), (2,0), (0,1), (1,1), (2,1), (0,2), (1,2), (2,2).

Author(s)

Stacey Winham

References

Ritchie et al (2001). Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hm Genet 69, 138-147.

Velez et al (2007). A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 31, 306-315.

See Also

mdr.cv,mdr.3WS

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#load test data
data(mdr1)

#define matrix of all two-way combinations of 15 SNPs; this 105 by 2 matrix defines the 105 combinations of two-way interactions to consider 
loci<-t(combn(15,2)) 

#this runs mdr on the sample data, considering the two-way combinations in 'loci', saving the top 5 models, and defining the threshold as 1 since the data is balanced
fit<-mdr(mdr1,loci,x=5,ratio=1) 

print(fit) #view the fitted mdr object

MDR documentation built on May 29, 2017, 7:05 p.m.