epigen: Generate Epistatic Effect Matrix
In aml: Adaptive Mixed LASSO

Description Usage Arguments Details Value References See Also Examples

View source: R/epigen.R

This function select specified number of markers using amltest and then forming a matrix including both main marker effects and two-way epistatic effects.

1
2
3

     epigen(response, marker, kin, numkeep=floor(length(response)*.5), selectvar, 
     corbnd=0.5, mafb=0.04, method="complete")

`response`	A numerical vector of the trait (phenotype) to be analyzed. It is passed to `amltest`.
`marker`	A matrix or data frame for the markers from which the main effects will be selected. The number of rows should equal the number of lines and the number of columns should equal the number of markers. The values of each element should be between 0 and 1 with minor allele encoded as 1 and majority allele as 0. If minor allele is encoded as 1 instead for some markers, `cleanclust` can be used to re-encode it. The function `cleanclust` should also be used to preprocess the marker data to remove marker with a high proportion of missing values or very low minor allele frequency as well as impute missing values with the sample mean. It is also recommend that `cleanclust` be used to filter the markers so that no markers are highly correlated. It is passed to `amltest`.
`kin`	The kinship matrix representing relationships between lines. It should be symmetric and positive definite, and have the number of rows and columns equal to the number of rows of `marker`. It is passed to `amltest`.
`numkeep`	The number of main marker effects that should be retained after the preliminary screening in `amltest`. It should be less than the number of lines. The default value is a half of the number of lines.
`selectvar`	The number of main marker effects to be included in the model. Strictly speaking, it is the number of iterations for the fitting procedure of `amltest`. The number of main marker effects that are retained could be slightly less than `selectvar`. See the documentation for `amltest`.
`mafb`	The minimum mean value of an effect. Effects with lower mean values (too many zeros) are removed. For a main marker effect, this is just the minimum value for minor allele frequency. The default is 0.04 and is passed to `cleanclust`.
`corbnd`	The bound used for cutting the dendrogram after the hierarchical clustering, the default is 0.5. See the documentation for `cleanclust`.
`method`	The method of clustering passed to `hclust`. The values could be one of "complete", "average" or "single". The default is "complete". See the documentation for `cleanclust`.

Since considering all two-way epistatic effects are not computationally feasible in most cases, amltest is called first to select a subset of markers with the most significant main effects. Then two-way epistatic effects are formed from these selected markers by taking the product of the two columns corresponding to each pair of markers. Subsequently, the cleanclust function is called to remove effects with very low mean values and also filter the effects such that no two effects are highly correlated. The resulted genetic effect matrix include both main effects and epistatic effects. It can then be used as input for amltest in the same manner as a marker matrix.

A list containing the following:

`effects`	A matrix of both selected main marker effects and two-way epistatic effects.
`marker1`	A vector of names corresponding to the first marker in two-way epistatic effects given in `effects`, or the marker name for a main effect.
`marker2`	A vector of names corresponding to the second marker in two-way epistatic effects given in `effects`, or the marker name for a main effect.

Wang, D., Eskridge, K.M. and Crossa, J. (2011) Identifying QTLs and Epistasis in Structured Plant Populations Using Adaptive Mixed LASSO. Journal of Agricultural, Biological, and Environmental Statistics, 16:170-184.

Wang, D., et al. (2012) Prediction of genetic values of quantitative traits with epistatic effects in plant breeding populations. Heredity, 109: 313-319.

amltest, cleanclust.

     ## process the markers in the wheat data set.
     data("wheat")
     clmarker<- cleanclust(wheat$marker, nafrac=0.2, mafb=0.1, corbnd=0.5, method="complete")
     intermat <- epigen(wheat$y, clmarker$newmarker, wheat$A, numkeep=100, selectvar=30, 
                        corbnd=0.5, mafb=0.04)