amltest: Adaptive Mixed LASSO Analysis
In aml: Adaptive Mixed LASSO

Description Usage Arguments Details Value References See Also Examples

View source: R/amltest.R

Perform adaptive mixed LASSO analysis. The function is designed for association mapping or genomic prediction in structured populations, though other applications are possible.

1 2	amltest(response, marker, kin, numkeep=floor(length(response)*.5), selectvar)

`response`	A numerical vector of the trait (phenotype) to be analyzed.
`marker`	A matrix or data frame for the marker (or more generally, genetic effect) information. The number of rows should equal the number of lines and the number of columns should equal the number of markers. The values of each element should be between 0 and 1 with minor allele encoded as 1 and majority allele as 0. If minor allele is encoded as 1 instead for some markers, `cleanclust` can be used to re-encode it. The function `cleanclust` should also be used to preprocess the marker data to remove marker with a high proportion of missing values or very low minor allele frequency as well as impute missing values with the sample mean. It is also recommend that `cleanclust` be used to filter the markers so that no markers are highly correlated.
`kin`	The kinship matrix representing relationships between lines. It should be symmetric and positive definite, and have the number of rows and columns equal to the number of rows of `marker`.
`numkeep`	The number of markers that should be retained after the preliminary screening. It should be less than the number of lines. The default value is a half of the number of lines. see Details.
`selectvar`	The number of markers to be included in the model. Strictly speaking, it is the number of iterations for the fitting procedure. The number of markers in the output could be slightly less than `selectvar`. See Details.

In adaptive mixed LASSO fitting, amltest first performs a preliminary screening to retain a set of markers (predictors) numbering at most numkeep, which should be less than the number of lines. This step relies on LASSO fitting using lars. The quantity numkeep is the maximum steps of iterations in LASSO fit. Due to the nature of the lars algorithm, the number of markers retained after the screening might be slightly less than numkeep. Then amltest will perform adaptive mixed LASSO fit by iteratively estimating the fixed effects and random effects up to the number of iterations defined by selectvar. Again, the number of markers in the output might be slightly less than selectvar as determined by the behavior of the lars algorithm. So if an exact number of markers are required in the model, some trial and error might be needed.

A list containing the following:

`estimate`	A matrix of two columns. The first column indicates which column in `marker` is included in the model fit and the second column is the effect for each marker in the model.
`AIC`	A vector of AIC values for models using different number of markers. The first entry is for model with zero markers (only random line effects) and the last entry corresponding to the model with markers specified in `estimate`.
`BIC`	A vector of BIC values for models using different number of markers. The first entry is for model with zero markers (only random line effects) and the last entry corresponding to the model with markers specified in `estimate`.
`EBIC`	A vector of EBIC values for models using different number of markers. The first entry is for model with zero markers (only random line effects) and the last entry corresponding to the model with markers specified in `estimate`.
`vars`	The vector for variance components of random effects. The first entry is the genetic variance σ^{2}_{g} and the second entry is the ratio of the error variance over the genetic variance. Thus the product of these two entries gives the error variance σ^{2}_{e}.
`mcount`	The vector of the number of markers in each step. This is mainly used in conjunction with AIC, BIC, or EBIC.

Wang, D., Eskridge, K.M. and Crossa, J. (2011) Identifying QTLs and Epistasis in Structured Plant Populations Using Adaptive Mixed LASSO. Journal of Agricultural, Biological, and Environmental Statistics, 16:170-184.

Wang, D., et al. (2012) Prediction of genetic values of quantitative traits with epistatic effects in plant breeding populations. Heredity, 109: 313-319.

cleanclust.

     ## analyze the wheat data with main marker effects.
     data("wheat")
     clmarker<- cleanclust(wheat$marker, nafrac=0.2, mafb=0.1, corbnd=0.5, method="complete")
     resmain <- amltest(wheat$y, clmarker$newmarker, wheat$A, numkeep=80, selectvar=40)