xMLcaret: Function to integrate predictor matrix in a supervised manner...

Description Usage Arguments Value Note See Also Examples

View source: R/xMLcaret.r

Description

xMLcaret is supposed to integrate predictor matrix in a supervised manner via machine learning algorithms using caret. The caret package streamlines model building and performance evaluation. It requires three inputs: 1) Gold Standard Positive (GSP) targets; 2) Gold Standard Negative (GSN) targets; 3) a predictor matrix containing genes in rows and predictors in columns, with their predictive scores inside it. It returns an object of class 'sTarget'.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
xMLcaret(
list_pNode = NULL,
df_predictor = NULL,
GSP,
GSN,
method = c("gbm", "svmRadial", "rda", "knn", "pls", "nnet", "rf",
"myrf", "cforest",
"glmnet", "glm", "bayesglm", "LogitBoost", "xgbLinear", "xgbTree"),
nfold = 3,
nrepeat = 10,
seed = 825,
aggregateBy = c("none", "logistic", "Ztransform", "fishers",
"orderStatistic"),
verbose = TRUE,
RData.location = "http://galahad.well.ox.ac.uk/bigdata",
guid = NULL
)

Arguments

list_pNode

a list of "pNode" objects or a "pNode" object

df_predictor

a data frame containing genes (in rows) and predictors (in columns), with their predictive scores inside it. This data frame must has gene symbols as row names

GSP

a vector containing Gold Standard Positive (GSP)

GSN

a vector containing Gold Standard Negative (GSN)

method

machine learning method. It can be one of "gbm" for Gradient Boosting Machine (GBM), "svmRadial" for Support Vector Machines with Radial Basis Function Kernel (SVM), "rda" for Regularized Discriminant Analysis (RDA), "knn" for k-nearest neighbor (KNN), "pls" for Partial Least Squares (PLS), "nnet" for Neural Network (NNET), "rf" for Random Forest (RF), "myrf" for customised Random Forest (RF), "cforest" for Conditional Inference Random Forest, "glmnet" for glmnet, "glm" for Generalized Linear Model (GLM), "bayesglm" for Bayesian Generalized Linear Model (BGLM), "LogitBoost" for Boosted Logistic Regression (BLR), "xgbLinear" for eXtreme Gradient Boosting as linear booster (XGBL), "xgbTree" for eXtreme Gradient Boosting as tree booster (XGBT)

nfold

an integer specifying the number of folds for cross validataion. Per fold creates balanced splits of the data preserving the overall distribution for each class (GSP and GSN), therefore generating balanced cross-vallidation train sets and testing sets. By default, it is 3 meaning 3-fold cross validation

nrepeat

an integer specifying the number of repeats for cross validataion. By default, it is 10 indicating the cross-validation repeated 10 times

seed

an integer specifying the seed

aggregateBy

the aggregate method used to aggregate results from repeated cross validataion. It can be either "none" for no aggregration (meaning the best model based on all data used for cross validation is used), or "orderStatistic" for the method based on the order statistics of p-values, or "fishers" for Fisher's method, "Ztransform" for Z-transform method, "logistic" for the logistic method. Without loss of generality, the Z-transform method does well in problems where evidence against the combined null is spread widely (equal footings) or when the total evidence is weak; Fisher's method does best in problems where the evidence is concentrated in a relatively small fraction of the individual tests or when the evidence is at least moderately strong; the logistic method provides a compromise between these two. Notably, the aggregate methods 'Ztransform' and 'logistic' are preferred here

verbose

logical to indicate whether the messages will be displayed in the screen. By default, it sets to TRUE for display

RData.location

the characters to tell the location of built-in RData files. See xRDataLoader for details

guid

a valid (5-character) Global Unique IDentifier for an OSF project. See xRDataLoader for details

Value

an object of class "sTarget", a list with following components:

Note

It will depend on whether a package "caret" and its suggested packages have been installed. It can be installed via: BiocManager::install(c("caret","e1071","gbm","kernlab","klaR","pls","nnet","randomForest","party","glmnet","arm","caTools","xgboost")).

See Also

xPierMatrix, xPredictROCR, xPredictCompare, xSparseMatrix, xSymbol2GeneID

Examples

1
2
3
4
5
RData.location <- "http://galahad.well.ox.ac.uk/bigdata"
## Not run: 
sTarget <- xMLcaret(df_prediction, GSP, GSN, method="myrf")

## End(Not run)

Pi documentation built on Nov. 29, 2021, 3 p.m.