gcforest: R for Deep Forest Model (gcForest)
In gcForest: Deep Forest Model

Description Usage Arguments Details Author(s) Examples

View source: R/gcForest.R

gcforest() base on a Python Deep Forest application programming interface (API). Reference https://github.com/pylablanche/gcForest.

gcforest(shape_1X=NA, n_mgsRFtree=30L, window=NA, stride=1L,
    cascade_test_size=0.2, n_cascadeRF=2L, n_cascadeRFtree=101L,
    cascade_layer=Inf,min_samples_mgs=0.1, min_samples_cascade=0.05,
    tolerance=0.0)

`shape_1X`	int or tuple list or np.array (default=None)Shape of a single sample element [n_lines, n_cols]. Required when calling mg_scanning!For sequence data a single int can be given.
`n_mgsRFtree`	int (default=30) Number of trees in a Random Forest during Multi Grain Scanning.
`window`	int (default=None)List of window sizes to use during Multi Grain Scanning. If 'None' no slicing will be done.
`stride`	int (default=1)Step used when slicing the data.
`cascade_test_size`	float or int (default=0.2) Split fraction or absolute number for cascade training set splitting.
`n_cascadeRF`	int (default=2)Number of Random Forests in a cascade layer. For each pseudo Random Forest a complete Random Forest is created, hence the total numbe of Random Forests in a layer will be 2*n_cascadeRF.
`n_cascadeRFtree`	int (default=101) Number of trees in a single Random Forest in a cascade layer.
`cascade_layer`	int (default=np.inf) mMximum number of cascade layers allowed. Useful to limit the contruction of the cascade.
`min_samples_mgs`	float or int (default=0.1) Minimum number of samples in a node to perform a split during the training of Multi-Grain Scanning Random Forest. If int number_of_samples = int. If float, min_samples represents the fraction of the initial n_samples to consider.
`min_samples_cascade`	float or int (default=0.1) Minimum number of samples in a node to perform a split during the training of Cascade Random Forest. If int number_of_samples = int. If float, min_samples represents the fraction of the initial n_samples to consider.
`tolerance`	float (default=0.0) Accuracy tolerance for the casacade growth. If the improvement in accuracy is not better than the tolerance the construction is stopped.

gcForest provides several important function interfaces, just like the style of Python sklearn.

fit(X,y) Training the gcForest on input data X and associated target y;
predict(X) Predict the class of unknown samples X;
predict_proba(X) Predict the class probabilities of unknown samples X;
mg_scanning(X, y=None) Performs a Multi Grain Scanning on input data;
window_slicing_pred_prob(X, window, shape_1X, y=None) Performs a window slicing of the input data and send them through Random Forests. If target values 'y' are provided sliced data are then used to train the Random Forests;
cascade_forest(X, y=None) Perform (or train if 'y' is not None) a cascade forest estimator;

Xu Jing

have_numpy <- reticulate::py_module_available("numpy")
have_sklearn <- reticulate::py_module_available("sklearn")

if(have_numpy && have_sklearn){
    library(gcForest)
    req_py()

    sk <- NULL

    .onLoad <- function(libname, pkgname) {
        sk <<- reticulate::import("sklearn", delay_load = TRUE)
      }

    sk <<- reticulate::import("sklearn", delay_load = TRUE)
    train_test_split <- sk$model_selection$train_test_split

    data <- sk$datasets$load_iris
    iris <- data()
    X = iris$data
    y = iris$target
    data_split = train_test_split(X, y, test_size=0.33)

    X_tr <- data_split[[1]]
    X_te <- data_split[[2]]
    y_tr <- data_split[[3]]
    y_te <- data_split[[4]]

    gcforest_m <- gcforest(shape_1X=4L, window=2L, tolerance=0.0)

    gcforest_m$fit(X_tr, y_tr)

    pred_X = gcforest_m$predict(X_te)
    print(pred_X)
}else{
    print('You should have the Python testing environment!')
}