snpRFcv: Random Forest Cross-Valdidation for feature selection
In snpRF: Random Forest for SNPs to Prevent X-chromosome SNP Importance Bias

Description Usage Arguments Value Author(s) References See Also Examples

This function shows the cross-validated prediction performance of models with sequentially reduced number of predictors (ranked by variable importance) via a nested cross-validation procedure.

1
2
3

snpRFcv(trainx.autosome=NULL,trainx.xchrom=NULL,trainx.covar=NULL, trainy, 
        cv.fold=5, scale="log", step=0.5, 
        mtry=function(p) max(1, floor(sqrt(p))), recursive=FALSE, ...)

`trainx.autosome`	A matrix of autosomal markers with each column corresponding to a SNP coded as count of a particular allele (i.e. 0,1 or 2), and each row corresponding to a sample/individual.
`trainx.xchrom`	A matrix of X chromosome markers, each marker coded as two adjacent columns, alleles of a marker are coded as 0 or 1 for carrying a particular allele. Although males only have one X-chromosome, their markers are coded as 2 columns as well, the second column being a duplicate of the first. Each row of this matrix corresponds to a sample/individual. This data must be phased in chromosomal order.
`trainx.covar`	A matrix of covariates, each column being a different covariate, and each row, a sample/individual.
`trainy`	vector of response, must be a factor and have length equal to the number of rows in `trainx.*`
`cv.fold`	number of folds in the cross-validation
`scale`	if `"log"`, reduce a fixed proportion (`step`) of variables at each step, otherwise reduce `step` variables at a time
`step`	if `log=TRUE`, the fraction of variables to remove at each step, else remove this many variables at a time
`mtry`	a function of number of remaining predictor variables to use as the `mtry` parameter in the `snpRF` call
`recursive`	whether variable importance is (re-)assessed at each step of variable reduction
`...`	other arguments passed on to `snpRF`

A list with the following components:

list(n.var=n.var, error.cv=error.cv, predicted=cv.pred)

`n.var`	vector of number of variables used at each step
`error.cv`	corresponding vector of error rates or MSEs at each step
`predicted`	list of `n.var` components, each containing the predicted values from the cross-validation

Andy Liaw, with slight modifications by Greg Jenkins

Svetnik, V., Liaw, A., Tong, C. and Wang, T., “Application of Breiman's Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules”, MCS 2004, Roli, F. and Windeatt, T. (Eds.) pp. 334-343.

snpRF, importance

set.seed(647)
data(snpRFexample)
result <- snpRFcv(trainx.autosome=autosome.snps,trainx.xchrom=xchrom.snps,
                  trainx.covar=covariates, trainy=phenotype)
with(result, plot(n.var, error.cv, log="x", type="o", lwd=2))

## The following can take a while to run, so if you really want to try
## it, copy and paste the code into R.

## Not run: 
result <- replicate(5,snpRFcv(trainx.autosome=autosome.snps,
                              trainx.xchrom=xchrom.snps,
                              trainx.covar=covariates, trainy=phenotype), 
		    simplify=FALSE)
error.cv <- sapply(result, "[[", "error.cv")
matplot(result[[1]]$n.var, cbind(rowMeans(error.cv), error.cv), type="l",
        lwd=c(2, rep(1, ncol(error.cv))), col=1, lty=1, log="x",
        xlab="Number of variables", ylab="CV Error")

## End(Not run)