Random_Gene_Pipeline: Random_gene_pipeline

Description Usage Arguments Value

View source: R/RF_Utilities.R

Description

Random_gene_pipeline

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Random_Gene_Pipeline(
  feature_table,
  classes,
  metric = "ROC",
  sampling = NULL,
  repeats = 10,
  path,
  nmtry = 6,
  ntree = 1001,
  nfolds = 3,
  ncrossrepeats = 10,
  pro = 0.8,
  list_of_seeds,
  list_of_random_gene_seeds
)

Arguments

feature_table

The feature table that contains the information to be input into the random forest classifier. Note that this table should not include information about the classes that are being predicted.

classes

A vector that represents the classes that each sample (row) in the feature table represents. This can be coded as Case (level 1 factor) and control (level 2 factor). Make sure the factor levels are correct with using AUPRC or results will not always be correct.

metric

A string that indicates whether the pipeline should use AUROC or AUPRC. For AUROC set metric="ROC". For AUPRC set metric="PR". Defaults to "ROC".

sampling

A string indicating that type of sampling that should be done incase of inbalanced class designs. Options include: "up", "down" "SMOTE" and NULL.

repeats

The number of times data should be split into testing and cross-validation datasets.

path

A string representing the PATH were output files should be saved.

nmtry

An integer representing the number of different mtry values that you want to test during cross validation. The values of mtry to test is calculated as follows: mtry <- round(seq(1, number_of_features/3, length=nmtry)). Defaults to 7.

ntree

An integer that represents the number of trees that you want to use during randoom forest construction. Defaults to 1001.

nfolds

An integer that represents the number of folds to used during cross validation. Defaults to 3.

ncrossrepeats

An integer that represents the number of times to run cross validation on k folds. Defaults to 10.

pro

The proporition of samples that should be used for training versus testing during cross validation. Defaults to 0.8

list_of_seeds

A vector containing a number of seeds that should be equal to the number of repeats.

list_of_random_gene_seeds

A matric containg rows that correspond to the column # of the gene you want included for each repeat

SEED

The random seed used to split the samples during cross validation. Defaults to 1995.

Value

This function returns a list with the following characteristics: "Object[[1]] contains all the median cross validation AUCS from each data split using the best mtry value" "Object[[2]] contains all the test AUC values from each data split" "Object[[3]] contains all the tested mtry values and the median ROC for each from each data split" "Object[[4]] contains the list of important features from the best model selected from each data split" "Object[[5]] contains each caret random forest model from each data split" "This function will also write a csv with cross validation AUCS and test AUCS, to the given path as well as an RDS file that contains the resulting object from this function"


nearinj/RandomForestUtils documentation built on July 30, 2020, 9:51 a.m.