ForeseeTrain: Train a Drug Efficacy Prediction Model

Description Usage Arguments Value

View source: R/ForeseeTrain.R

Description

ForeseeTrain uses the data of the TrainObject to train a black box model that can later be applied to new data in order to predict drug efficacy. The CellResponseProcessor prepares the response data of the TrainObject for prediction. Duplicates in the gene names are removed using the Foresee DuplicationHandler. The Homogenizer function reduces batch effects between train and test data. The FeatureSelector restricts the input features of the TrainObject to a specific set tht is to be used for the model. The FeaturePreprocessor converts the original features into predictive features. The Sample Selector restricts the training samples to those of a specific tissue. The FeatureCombiner combines features of all specified data types into one feature matrix. The BlackBox applies a machine learning algorithm to the preprocessed data to train a model that is predictive of drug response.

Usage

1
2
3
4
5
6
7
ForeseeTrain(TrainObject, TestObject, DrugName, CellResponseType,
  CellResponseTransformation = "powertransform",
  InputDataTypes = "GeneExpression", TrainingTissue = "all",
  TestingTissue = "all", DuplicationHandling = "first",
  HomogenizationMethod = "ComBat", GeneFilter = "all",
  FeaturePreprocessing = "none", BlackBox = "ridge",
  nfoldCrossvalidation = 1, ...)

Arguments

TrainObject

Object that contains all data needed to train a model, including molecular data (such as gene expression, mutation, copy number variation, methylation, cancer type, etc. ) and drug response data

TestObject

Object that contains all data that the model is to be tested on, including molecular data (such as gene expression, mutation, copy number variation, methylation, cancer type, etc. ) and drug response data

DrugName

Name of the drug whose efficacy is supposed to be predicted with the model. You can get all possible values with listDrugs(OBJ) or listInputOptions("DrugName", OBJ), where OBJ is the object you want to use as TrainObject. In cases with ForeseeCell Objects as both TrainObject and TestObject inputs, if you want to use two different drugs from train & test objects (or the same drug have different names in your train and test objects), you can use a character vector with the length 2 as DrugName input. In this case, the first value in the character vector is used on TrainObject and the second on TestObject.

CellResponseType

Format of the drug response data of the TrainObject, such as IC50, AUC, GI50, etc. You can get all possible values with listInputOptions("CellResponseType", OBJ), where OBJ is the object you want to use as TrainObject.

CellResponseTransformation

Method that is to be used to transform the drug response data of the TrainObject, such as power transform, logarithm, binarization, user defined functions, etc. Get all possible values with listInputOptions("CellResponseProcessor").

InputDataTypes

Data types of the TrainObject that are to be used to train the model, such as GeneExpression, Mutation, CopyNumberVariation, Methylation, Cancertype, etc. You can get all possible values with listInputOptions("InputDataTypes", OBJ), where OBJ is the object you want to use as TrainObject.

TrainingTissue

Tissue type that the cell lines of the TrainObject should be of, such as pancreas or lung. Default is "all" for pancancer analysis. You can get all possible values with listInputOptions("TrainingTissue", OBJ), where OBJ is the object you want to use as TrainObject.

TestingTissue

Tissue type that the cell lines or samples of the TestObject should be of, such as pancreas or lung. Default is "all" for analysis of all samples. You can get all possible values with listInputOptions("TestingTissue", OBJ), where OBJ is the object you want to use as TestObject.

DuplicationHandling

Method for handling duplicates of gene names, such as considering none, the mean, the first hit, etc. Get all possible values with listInputOptions("DuplicationHandler").

HomogenizationMethod

Method for homogenizing data of the TrainObject and TestObject, such as ComBat, quantile normalization, limma, RUV, etc. Get all possible values with listInputOptions("Homogenizer").

GeneFilter

Set of genes to be considered for training the model, such as all, a certain percantage based on variance or p-value, specific gene sets like landmark genes, gene ontologies or pathways, etc. Get all possible values with listInputOptions("FeatureSelector").

FeaturePreprocessing

Method for preprocessing the inputs of the model, such as z-score, principal component analysis, PhysioSpace similarity, etc. Get all possible values with listInputOptions("FeaturePreprocessor").

BlackBox

Modeling algorithm for training, such as linear regression, elastic net, lasso regression, ridge regression, tandem, support vector machines, random forests, user defined functions, etc. Get all possible values with listInputOptions("BlackBoxFilter").

nfoldCrossvalidation

# folds to use for crossvalidation while training the model. If put to one, the complete data of the TrainObject is used for training.

Value

ForeseeModel

A black box model trained on the TrainObject data that can be applied to new test data.

TrainObject

The TrainObject with preprocessed and filtered features.

TestObject

The TestObject with preprocessed and filtered features.


JRC-COMBINE/FORESEE documentation built on Jan. 24, 2020, 1:19 a.m.