pipeMostSig: Evaluate 'glmnet' Prediction on Methylation Data

Description Usage Arguments Details Value Examples

View source: R/pipeMostSig.R

Description

Read data, based on one row of information_df, then selected most significant cpgs as predictors to fit elastic net, random forest or support vector machine model and evaluate its prediction performance.

Usage

1
2
3
4
pipeMostSig(rowNum, Beta_df, beta2M = TRUE, respCol_index, designInfo_df,
  alphaValue = seq(0, 1, by = 0.1), ncores = 2, npredictors = 5000,
  predictMethod = c("glmnet", "randomForest", "svm"),
  outcome_type = "binomial", save = FALSE, resultPath = NULL)

Arguments

rowNum

num of row in information_df

Beta_df

Beta_df is a data frame that each row is a cpg probe, each col is a sample id, each cell is a Beta value, first column is the phenodata, please make sure this column is a factor with levels, or we can not ensure accuracy of the results

beta2M

whether transfre beta to m value before prediction

respCol_index

response variable col number in beta data frame

designInfo_df

information df generate by summaryInfo function

alphaValue

vector that storage alpha values

ncores

number of cores to do parallel computing

npredictors

number of cpgs chosen to be predictors( default set to 5k, suggest 5k, 10k, 20k, 50k)

predictMethod

what prediction method to use

outcome_type

type of outcome variable, gauusian or binomial or poisson, etc

save

whether to save the results

resultPath

path to storage results

Details

predictMethod1:

Elastic net from function glmnet to do prediction

predictMethod2:

Random Forest from function train to do prediction(requires package "randomForest" installed first)

predictMethod3:

Support Vector Machine from function train to do prediction(requires package "kernlab" installed)

Value

return a list with three elements,

  1. first element is the fit model results of different prediction methods

  2. Second item second element is the data frame that contains evalutation parameters of different prediction methods' performace:
    for glmnet net, the data frame has row number equal to number of alpha values given in the function argument times 16 columns with different evaluation parameters including NumOfRep,NumOfCv, auc_results, Sensitivity, Specificity, etc;
    for random forest and support vector machine, the data frame has one row times 14 columns with different evaluation parameters including NumOfRep,NumOfCv, auc_results, Sensitivity, Specificity, etc

  3. third element is a vector that indicate number of predictors used

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Not run: 
 data(Example_df)
 data(pfcInfo_df)

 test <- pipeMostSig(
   rowNum = 10,
   Beta_df = Example_df,
   beta2M = TRUE,
   respCol_index = 1,
   designInfo_df = pfcInfo_df,
   alphaValue = seq(0, 1, by = 0.1),
   ncores = 2,
   npredictors = 5000,
   predictMethod = "glmnet",
   outcome_type = "binomial",
   save = FALSE,
   resultPath = NULL
 )

## End(Not run)

lizhongliu1996/PredictMisc documentation built on Aug. 23, 2019, 5:55 a.m.