Description Usage Arguments Value Note References See Also Examples
Fits the fuzzy forests algorithm. Note that a formula interface for
fuzzy forests also exists: ff.formula
.
1 2 3 4 5 6 7 8 | ## Default S3 method:
ff(X, y, Z = NULL, module_membership,
screen_params = screen_control(min_ntree = 500),
select_params = select_control(min_ntree = 500), final_ntree = 5000,
num_processors = 1, nodesize, test_features = NULL, test_y = NULL,
...)
ff(X, ...)
|
X |
A data.frame. Each column corresponds to a feature vectors. |
y |
Response vector. For classification, y should be a factor. For regression, y should be numeric. |
Z |
A data.frame. Additional features that are not to be screened out at the screening step. |
module_membership |
A character vector giving the module membership of each feature. |
screen_params |
Parameters for screening step of fuzzy forests.
See |
select_params |
Parameters for selection step of fuzzy forests.
See |
final_ntree |
Number of trees grown in the final random forest. This random forest contains all selected features. |
num_processors |
Number of processors used to fit random forests. |
nodesize |
Minimum terminal nodesize. 1 if classification.
5 if regression. If the sample size is very large,
the trees will be grown extremely deep.
This may lead to issues with memory usage and may
lead to significant increases in the time it takes
the algorithm to run. In this case,
it may be useful to increase |
test_features |
A data.frame containing features from a test set. The data.frame should contain the features in both X and Z. |
test_y |
The responses for the test set. |
... |
Additional arguments currently not used. |
An object of type fuzzy_forest
. This
object is a list containing useful output of fuzzy forests.
In particular it contains a data.frame with a list of selected the features.
It also includes a random forest fit using the selected features.
This work was partially funded by NSF IIS 1251151 and AMFAR 8721SC.
Conn, D., Ngun, T., Ramirez C.M., Li, G. (2019). "Fuzzy Forests: Extending Random Forest Feature Selection for Correlated, High-Dimensional Data." Journal of Statistical Software, 91(9). doi: 10.18637/jss.v091.i09
Breiman, L. (2001). "Random Forests." Machine Learning, 45(1), 5-32. doi: 10.1023/A:1010933404324
Zhang, B. and Horvath, S. (2005). "A General Framework for Weighted Gene Co-Expression Network Analysis." Statistical Applications in Genetics and Molecular Biology, 4(1). doi: 10.2202/1544-6115.1128
ff.formula
,
print.fuzzy_forest
,
predict.fuzzy_forest
,
modplot
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | #ff requires that the partition of the covariates be previously determined.
#ff is also handy if the user wants to test out multiple settings of WGCNA
#prior to running fuzzy forests.
library(mvtnorm)
gen_mod <- function(n, p, corr) {
sigma <- matrix(corr, nrow=p, ncol=p)
diag(sigma) <- 1
X <- rmvnorm(n, sigma=sigma)
return(X)
}
gen_X <- function(n, mod_sizes, corr){
m <- length(mod_sizes)
X_list <- vector("list", length = m)
for(i in 1:m){
X_list[[i]] <- gen_mod(n, mod_sizes[i], corr[i])
}
X <- do.call("cbind", X_list)
return(X)
}
err_sd <- .5
n <- 500
mod_sizes <- rep(25, 4)
corr <- rep(.8, 4)
X <- gen_X(n, mod_sizes, corr)
beta <- rep(0, 100)
beta[c(1:4, 76:79)] <- 5
y <- X%*%beta + rnorm(n, sd=err_sd)
X <- as.data.frame(X)
Xtest <- gen_X(n, mod_sizes, corr)
ytest <- Xtest%*%beta + rnorm(n, sd=err_sd)
Xtest <- as.data.frame(Xtest)
cdist <- as.dist(1 - cor(X))
hclust_fit <- hclust(cdist, method="ward.D")
groups <- cutree(hclust_fit, k=4)
screen_c <- screen_control(keep_fraction = .25,
ntree_factor = 1,
min_ntree = 250)
select_c <- select_control(number_selected = 10,
ntree_factor = 1,
min_ntree = 250)
ff_fit <- ff(X, y, module_membership = groups,
screen_params = screen_c,
select_params = select_c,
final_ntree = 250)
#extract variable importance rankings
vims <- ff_fit$feature_list
#plot results
modplot(ff_fit)
#obtain predicted values for a new test set
preds <- predict(ff_fit, new_data=Xtest)
#estimate test set error
test_err <- sqrt(sum((ytest - preds)^2)/n)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.