ff: Fits fuzzy forest algorithm.

Description Usage Arguments Value Note References Examples

Description

Fits fuzzy forest algorithm. Returns fuzzy forest object.

Usage

1
2
3
4
ff(X, y, Z = NULL, module_membership,
  screen_params = screen_control(min_ntree = 5000),
  select_params = select_control(min_ntree = 5000), final_ntree = 5000,
  num_processors = 1, nodesize, test_features = NULL, test_y = NULL)

Arguments

X

A data.frame. Each column corresponds to a feature vectors.

y

Response vector. For classification, y should be a factor or a character. For regression, y should be numeric.

Z

A data.frame. Additional features that are not to be screened out at the screening step.

module_membership

A vector giving module membership of each feature.

screen_params

Parameters for screening step of fuzzy forests. See screen_control for details. screen_params is an object of type screen_control.

select_params

Parameters for selection step of fuzzy forests. See select_control for details. select_params is an object of type select_control.

final_ntree

Number trees grown in the final random forest. This random forest contains all selected features.

num_processors

Number of processors used to fit random forests.

nodesize

Minimum terminal nodesize. 1 if classification. 5 if regression. If the sample size is very large, the trees will be grown extremely deep. This may lead to issues with memory usage and may lead to significant increases in the time it takes the algorithm to run. In this case, it may be useful to increase nodesize.

test_features

A data.frame containing features from a test set. The data.frame should contain the features in both X and Z.

test_y

The responses for the test set.

Value

An object of type fuzzy_forest. This object is a list containing useful output of fuzzy forests. In particular it contains a data.frame with list of selected features. It also includes the random forest fit using the selected features.

Note

This work was partially funded by NSF IIS 1251151.

References

Leo Breiman (2001). Random Forests. Machine Learning, 45(1), 5-32.

Daniel Conn, Tuck Ngun, Christina M. Ramirez (2015). Fuzzy Forests: a New WGCNA Based Random Forest Algorithm for Correlated, High-Dimensional Data, Journal of Statistical Software, Manuscript in progress.

Bin Zhang and Steve Horvath (2005) "A General Framework for Weighted Gene Co-Expression Network Analysis", Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 17

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#ff requires that the partition of the covariates be previously determined.
#ff is handy if the user wants to test out multiple settings of WGCNA
#prior to running fuzzy forests.
library(WGCNA)
library(randomForest)
library(fuzzyforest)
data(ctg)
y <- ctg$NSP
X <- ctg[, 2:22]

#set tuning parameters for WGCNA
net = blockwiseModules(X, power = 6, minModuleSize = 1, nThreads = 1)


#extract module membership for each covariate
module_membership <- net$colors

#set tuning parameters
mtry_factor <- 1; min_ntree <- 500;  drop_fraction <- .5; ntree_factor <- 1
screen_params <- screen_control(drop_fraction = drop_fraction,
                                keep_fraction = .25, min_ntree = min_ntree,
                                ntree_factor = ntree_factor,
                                mtry_factor = mtry_factor)
select_params <- select_control(drop_fraction = drop_fraction,
                                number_selected = 5,
                                min_ntree = min_ntree,
                                ntree_factor = ntree_factor,
                                mtry_factor = mtry_factor)

#fit fuzzy forests

ff_fit <- ff(X, y, module_membership = module_membership,
                screen_params = screen_params,
                select_params = select_params,
                final_ntree = 500)

#extract variable importance rankings
vims <- ff_fit$feature_list

#plot results
modplot(ff_fit)

OHDSI/FuzzyForest documentation built on May 7, 2019, 8:26 p.m.