Description Usage Arguments Value
Runs a similar pipleline as Run_RF_Pipeline however takes in random scramblings of the class assignments for each sample (row in feature table). The results from this function can act as a null distrubition to compare models against.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | get_random_rf_results(
feature_table,
list_of_scrambles,
metric = "ROC",
sampling = NULL,
repeats = 10,
path,
nmtry = 6,
ntree = 1001,
nfolds = 3,
ncrossrepeats = 10,
pro = 0.8,
list_of_seeds
)
|
feature_table |
The feature table that contains the information to be input into the random forest classifier. Note that this table should not include information about the classes that are being predicted. |
list_of_scrambles |
A list of vectors that is equal to the number of repeats that cross validation should be run. Each item within this list should contain a random scrambling of the classes set to each sample. |
metric |
A string that indicates whether the pipeline should use AUROC or AUPRC. For AUROC set metric="ROC". For AUPRC set metric="PR". Defaults to "ROC". |
sampling |
A string indicating that type of sampling that should be done incase of inbalanced class designs. Options include: "up", "down" "SMOTE" and NULL. |
repeats |
The number of times data should be split into testing and cross-validation datasets. |
path |
A string representing the PATH were output files should be saved. |
nmtry |
An integer representing the number of different mtry values that you want to test during cross validation. The values of mtry to test is calculated as follows: mtry <- round(seq(1, number_of_features/3, length=nmtry)). Defaults to 7. |
ntree |
An integer that represents the number of trees that you want to use during randoom forest construction. Defaults to 1001. |
nfolds |
An integer that represents the number of folds to used during cross validation. Defaults to 3. |
ncrossrepeats |
An integer that represents the number of times to run cross validation on k folds. Defaults to 10. |
pro |
The proporition of samples that should be used for training versus testing during cross validation. Defaults to 0.8 |
list_of_seeds |
A vector containing a number of seeds that should be equal to the number of repeats. |
SEED |
The random seed used to split the samples during cross validation. Defaults to 1995. |
This function returns a list with the following characteristics: "Object[[1]] contains all the median cross validation AUCS from each data split using the best mtry value" "Object[[2]] contains all the test AUC values from each data split" "Object[[3]] contains all the tested mtry values and the median ROC for each from each data split" "Object[[4]] contains the list of important features from the best model selected from each data split" "Object[[5]] contains each caret random forest model from each data split" "This function will also write a csv with cross validation AUCS and test AUCS, to the given path as well as an RDS file that contains the resulting object from this function"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.