Dforest is a R-implementation of Decision Forest algorithm that combines the predictions of multiple independent decision tree models for a consensus decision. In particular, Decision Forest is a novel pattern-recognition method which can be used to analyze:
In this package, three fundamental functions are provided, as
run Dforest() to see more instructions.
First, simply install and load Dforest package into your R environment:
if (!require(Dforest)){ install.packages("Dforest",repos = "https://cran.r-project.org/") } library(Dforest)
DF_train() is used to train Decision Forest model based on training dataset.
Here is an simple example for using this function.
data(demo_simple) set.seed(188) # for simplicity, we only used two-classes classification. X <- subset(data_dili$X, data_dili$Y!="Less-DILI-Concern") Y <- subset(data_dili$Y, data_dili$Y!="Less-DILI-Concern") names(Y)=rownames(X) random_seq=sample(nrow(X)) split_rate=3 split_sample = suppressWarnings(split(random_seq,1:split_rate)) Train_X = X[-random_seq[split_sample[[1]]],] Train_Y = Y[-random_seq[split_sample[[1]]]] Test_X = X[random_seq[split_sample[[1]]],] Test_Y = Y[random_seq[split_sample[[1]]]] used_model = DF_train(Train_X, Train_Y)
Here is the structure of built Decision Forest Model:
summary(used_model)
Where the performance stored the Fitting performance of training process.
Let's take a look:
- The Accuracy (ACC) is r used_model$performance$ACC
;
- The Matttews' Correlation Coefficient (MCC) is r used_model$performance$MCC
;
- The Balanced Accuracy (bACC) is r used_model$performance$bACC
.
Due to the dataset we used here, we found that DILI is not a well predicted endpoint!
OK, now let's see how about the performance on testing dataset.
Pred_result = DF_pred(used_model,Test_X,Test_Y)
Similarly, let's take a look at the predicting performance:
- The Accuracy (ACC) is r Pred_result$performance$ACC
;
- The Matttews' Correlation Coefficient (MCC) is r Pred_result$performance$MCC
;
- The Balanced Accuracy (bACC) is r Pred_result$performance$bACC
.
Compared the result from training and testing performance:
matrix(c(round(used_model$performance$ACC,3), round(Pred_result$performance$ACC,3), round(used_model$performance$MCC,3), round(Pred_result$performance$MCC,3), round(used_model$performance$bACC,3), round(Pred_result$performance$bACC,3)), nrow=2,ncol=3, dimnames=list(c("Training", "Testing"),c("ACC","MCC","bACC")) )
Sometime, you want to run cross validation analysis on the entire dataset.
CV_result = DF_CV(Train_X, Train_Y) DF_CVsummary(CV_result,plot = T)
Finally, Let's see something Fancy with only one command.
Result = DF_easy(Train_X, Train_Y, Test_X, Test_Y)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.