Description Usage Arguments Value Note References See Also Examples
Implements formula interface for ff
.
1 2 |
formula |
Formula object. |
data |
data used in the analysis. |
module_membership |
A character vector giving the module membership of each feature. |
... |
Additional arguments |
An object of type fuzzy_forest
. This
object is a list containing useful output of fuzzy forests.
In particular it contains a data.frame with list of selected features.
It also includes the random forest fit using the selected features.
See ff
for additional arguments.
Note that the matrix, Z
, of features that do not go through
the screening step must specified separately from the formula.
test_features
and test_y
are not supported in formula
interface. As in the randomForest
package, for large data sets
the formula interface may be substantially slower.
This work was partially funded by NSF IIS 1251151 and AMFAR 8721SC.
Conn, D., Ngun, T., Ramirez C.M., Li, G. (2019). "Fuzzy Forests: Extending Random Forest Feature Selection for Correlated, High-Dimensional Data." Journal of Statistical Software, 91(9). doi: 10.18637/jss.v091.i09
Breiman, L. (2001). "Random Forests." Machine Learning, 45(1), 5-32. doi: 10.1023/A:1010933404324
Zhang, B. and Horvath, S. (2005). "A General Framework for Weighted Gene Co-Expression Network Analysis." Statistical Applications in Genetics and Molecular Biology, 4(1). doi: 10.2202/1544-6115.1128
ff
,
print.fuzzy_forest
,
predict.fuzzy_forest
,
modplot
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | #ff requires that the partition of the covariates be previously determined.
#ff is also handy if the user wants to test out multiple settings of WGCNA
#prior to running fuzzy forests.
library(mvtnorm)
gen_mod <- function(n, p, corr) {
sigma <- matrix(corr, nrow=p, ncol=p)
diag(sigma) <- 1
X <- rmvnorm(n, sigma=sigma)
return(X)
}
gen_X <- function(n, mod_sizes, corr){
m <- length(mod_sizes)
X_list <- vector("list", length = m)
for(i in 1:m){
X_list[[i]] <- gen_mod(n, mod_sizes[i], corr[i])
}
X <- do.call("cbind", X_list)
return(X)
}
err_sd <- .5
n <- 500
mod_sizes <- rep(25, 4)
corr <- rep(.8, 4)
X <- gen_X(n, mod_sizes, corr)
beta <- rep(0, 100)
beta[c(1:4, 76:79)] <- 5
y <- X%*%beta + rnorm(n, sd=err_sd)
X <- as.data.frame(X)
dat <- as.data.frame(cbind(y, X))
Xtest <- gen_X(n, mod_sizes, corr)
ytest <- Xtest%*%beta + rnorm(n, sd=err_sd)
Xtest <- as.data.frame(Xtest)
cdist <- as.dist(1 - cor(X))
hclust_fit <- hclust(cdist, method="ward.D")
groups <- cutree(hclust_fit, k=4)
screen_c <- screen_control(keep_fraction = .25,
ntree_factor = 1,
min_ntree = 250)
select_c <- select_control(number_selected = 10,
ntree_factor = 1,
min_ntree = 250)
ff_fit <- ff(y ~ ., data=dat,
module_membership = groups,
screen_params = screen_c,
select_params = select_c,
final_ntree = 250)
#extract variable importance rankings
vims <- ff_fit$feature_list
#plot results
modplot(ff_fit)
#obtain predicted values for a new test set
preds <- predict(ff_fit, new_data=Xtest)
#estimate test set error
test_err <- sqrt(sum((ytest - preds)^2)/n)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.