View source: R/TADrandomForest.R
TADrandomForest | R Documentation |
caret::train
to apply a random forest
classification algorithm built and tested on user-defined binned domain
data from createTADdata
.A wrapper function passed to caret::train
to apply a random forest
classification algorithm built and tested on user-defined binned domain
data from createTADdata
.
TADrandomForest( trainData, testData = NULL, tuneParams = list(mtry = ceiling(sqrt(ncol(trainData) - 1)), ntree = 500, nodesize = 1), cvFolds = 3, cvMetric = "Accuracy", verbose = FALSE, model = TRUE, importances = TRUE, impMeasure = "MDA", performances = FALSE )
trainData |
Data frame, the binned data matrix to built a random forest
classifiers (can be obtained using |
testData |
Data frame, the binned data matrix to test random forest
classifiers (can be obtained using |
tuneParams |
List, providing |
cvFolds |
Numeric, number of k-fold cross-validation to perform in order to tune the hyperparameters. Required. |
cvMetric |
Character, performance metric to use to choose optimal tuning parameters (one of either "Kappa", "Accuracy", "MCC", "ROC", "Sens", "Spec", "Pos Pred Value", "Neg Pred Value"). Default is "Accuracy". |
verbose |
Logical, controls whether or not details regarding modeling should be printed out. Default is TRUE. |
model |
Logical, whether to keep the model object. Default is TRUE. |
importances |
Logical, whether to extract variable importances. Default is TRUE. |
impMeasure |
Character, indicates the variable importance measure to use (one of either "MDA" (mean decrease in accuracy) or "MDG" (mean decrease in gini)). Ignored if importances = FALSE. |
performances |
Logical, indicates whether various performance metrics should be extracted when validating the model on the test data. Ignored if testData = NULL. |
A list containing: 1) a train object from caret
with model
information, 2) a data.frame of variable importance for each feature
included in the model, and 3) a data.frame of various performance metrics
# Read in ARROWHEAD-called TADs at 5kb data(arrowhead_gm12878_5kb) # Extract unique boundaries bounds.GR <- extractBoundaries(domains.mat = arrowhead_gm12878_5kb, filter = FALSE, CHR = c("CHR21", "CHR22"), resolution = 5000) # Read in GRangesList of 26 TFBS data(tfbsList) # Create the binned data matrix for CHR1 (training) and CHR22 (testing) # using 5 kb binning, distance-type predictors from 26 different TFBS from # the GM12878 cell line, and random under-sampling tadData <- createTADdata(bounds.GR = bounds.GR, resolution = 5000, genomicElements.GR = tfbsList, featureType = "distance", resampling = "rus", trainCHR = "CHR21", predictCHR = "CHR22") # Perform random forest using TADrandomForest by tuning mtry over 10 values # using 3-fold CV tadModel <- TADrandomForest(trainData = tadData[[1]], testData = tadData[[2]], tuneParams = list(mtry = c(2,5,8,10,13,16,18,21,24,26), ntree = 500, nodesize = 1), cvFolds = 3, cvMetric = "Accuracy", verbose = TRUE, model = TRUE, importances = TRUE, impMeasure = "MDA", performances = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.