xMLrandomforest: Function to integrate predictor matrix in a supervised manner...

Description Usage Arguments Value Note See Also Examples

View source: R/xMLrandomforest.r

Description

xMLrandomforest is supposed to integrate predictor matrix in a supervised manner via machine learning algorithm random forest. It requires three inputs: 1) Gold Standard Positive (GSP) targets; 2) Gold Standard Negative (GSN) targets; 3) a predictor matrix containing genes in rows and predictors in columns, with their predictive scores inside it. It returns an object of class 'sTarget'.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
xMLrandomforest(
list_pNode = NULL,
df_predictor = NULL,
GSP,
GSN,
nfold = 3,
nrepeat = 10,
seed = 825,
mtry = NULL,
ntree = 1000,
fold.aggregateBy = c("logistic", "Ztransform", "fishers",
"orderStatistic"),
verbose = TRUE,
RData.location = "http://galahad.well.ox.ac.uk/bigdata",
guid = NULL,
...
)

Arguments

list_pNode

a list of "pNode" objects or a "pNode" object

df_predictor

a data frame containing genes (in rows) and predictors (in columns), with their predictive scores inside it. This data frame must has gene symbols as row names

GSP

a vector containing Gold Standard Positive (GSP)

GSN

a vector containing Gold Standard Negative (GSN)

nfold

an integer specifying the number of folds for cross validataion. Per fold creates balanced splits of the data preserving the overall distribution for each class (GSP and GSN), therefore generating balanced cross-vallidation train sets and testing sets. By default, it is 3 meaning 3-fold cross validation

nrepeat

an integer specifying the number of repeats for cross validataion. By default, it is 10 indicating the cross-validation repeated 10 times

seed

an integer specifying the seed

mtry

an integer specifying the number of predictors randomly sampled as candidates at each split. If NULL, it will be tuned by 'randomForest::tuneRF', with starting value as sqrt(p) where p is the number of predictors. The minimum value is 3

ntree

an integer specifying the number of trees to grow. By default, it sets to 2000

fold.aggregateBy

the aggregate method used to aggregate results from k-fold cross validataion. It can be either "orderStatistic" for the method based on the order statistics of p-values, or "fishers" for Fisher's method, "Ztransform" for Z-transform method, "logistic" for the logistic method. Without loss of generality, the Z-transform method does well in problems where evidence against the combined null is spread widely (equal footings) or when the total evidence is weak; Fisher's method does best in problems where the evidence is concentrated in a relatively small fraction of the individual tests or when the evidence is at least moderately strong; the logistic method provides a compromise between these two. Notably, the aggregate methods 'Ztransform' and 'logistic' are preferred here

verbose

logical to indicate whether the messages will be displayed in the screen. By default, it sets to TRUE for display

RData.location

the characters to tell the location of built-in RData files. See xRDataLoader for details

guid

a valid (5-character) Global Unique IDentifier for an OSF project. See xRDataLoader for details

...

additional parameters. Please refer to 'randomForest::randomForest' for the complete list.

Value

an object of class "sTarget", a list with following components:

Note

none

See Also

xPierMatrix, xSparseMatrix, xPredictROCR, xPredictCompare, xSymbol2GeneID

Examples

1
2
3
4
5
RData.location <- "http://galahad.well.ox.ac.uk/bigdata"
## Not run: 
sTarget <- xMLrandomforest(df_prediction, GSP, GSN)

## End(Not run)

Pi documentation built on Nov. 26, 2020, 2:01 a.m.