calcPhenotype: Generate predicted drug sensitivity scores
In oncoPredict: Drug Response Modeling and Biomarker Discovery

calcPhenotype

R Documentation

Generate predicted drug sensitivity scores

Description

This function predicts a phenotype (drug sensitivity score) when provided with microarray or bulk RNAseq gene expression data of different platforms. The imputations are performed using ridge regression, training on a gene expression matrix where phenotype is already known. This function integrates training and testing datasets via a user-defined procedure, and power transforming the known phenotype.

Usage

calcPhenotype(
  trainingExprData,
  trainingPtype,
  testExprData,
  batchCorrect,
  powerTransformPhenotype = TRUE,
  removeLowVaryingGenes = 0.2,
  minNumSamples,
  selection = 1,
  printOutput,
  pcr = FALSE,
  removeLowVaringGenesFrom,
  report_pc = FALSE,
  cc = FALSE,
  percent = 80,
  rsq = FALSE,
  folder = FALSE,
  parallel = FALSE,
  cores = 1
)

Arguments

`trainingExprData`	The training data. A matrix of expression levels. rownames() are genes, colnames() are samples (cell line names or cosmic ides, etc.). rownames() must be specified and must contain the same type of gene ids as "testExprData"
`trainingPtype`	The known phenotype for "trainingExprData". This data must be a matrix with training samples as rows and drugs or phenotypes as columns. This matrix can contain NA values, that is ok (they are removed in the calcPhenotype() function).
`testExprData`	The test data where the phenotype will be estimated. It is a matrix of expression levels, rows contain genes and columns contain samples, "rownames()" must be specified and must contain the same type of gene ids as "trainingExprData".
`batchCorrect`	How should training and test data matrices be homogenized. Choices are "eb" (default) for ComBat, "qn" for quantile normalization, "standardize" for within-dataset z-score standardization, "rank", "rank_then_eb", or "none" for no homogenization.
`powerTransformPhenotype`	Should the phenotype be power transformed before we fit the regression model? Default to TRUE, set to FALSE if the phenotype is already known to be highly normal.
`removeLowVaryingGenes`	What proportion of low varying genes should be removed? 20 percent be default
`minNumSamples`	How many training and test samples are required. Print an error if below this threshold
`selection`	How should duplicate gene ids be handled. Default is -1 which asks the user. 1 to summarize by their or 2 to disguard all duplicates.
`printOutput`	Set to FALSE to supress output.
`pcr`	Indicates whether or not you'd like to use pcr for feature (gene) reduction. Options are 'TRUE' and 'FALSE'. If you indicate 'report_pc=TRUE' you need to also indicate 'pcr=TRUE'
`removeLowVaringGenesFrom`	Determine method to remove low varying genes. Options are 'homogenizeData' and 'rawData'.
`report_pc`	Indicates whether you want to output the training principal components. Options are 'TRUE' and 'FALSE'.
`cc`	Indicate if you want correlation coefficients for biomarker discovery.
`percent`	Indicate percent variability (of the training data) you'd like principal components to reflect if pcr=TRUE. Default is 80 for 80% These are the correlations between a given gene of interest across all samples vs. a given drug response across samples. These correlations can be ranked to obtain a ranked correlation to determine highly correlated drug-gene associations.
`rsq`	Indicate whether or not you want to output the R^2 values for the data you train on from true and predicted values. These values represent the percentage in which the optimal model accounts for the variance in the training data. Options are 'TRUE' and 'FALSE'.
`folder`	If TRUE, write calcPhenotype outputs to calcPhenotype_Output in the current working directory. The default is FALSE.
`parallel`	If TRUE, fit drug models in parallel after the shared homogenization and gene-filtering steps are complete. The default is FALSE.
`cores`	The number of cores to use when parallel is TRUE. Parallel execution uses forked processes via parallel::mclapply, which is not available for multicore execution on Windows PCs; on Windows, calcPhenotype will warn and run serially.

Value

A matrix of predicted drug response values. If rsq, cc, or report_pc is TRUE, returns a list containing the predictions and requested optional outputs. If folder is TRUE, the same object is returned invisibly after files are written.

Examples

set.seed(1)
genes <- paste0("gene", 1:30)
trainingExprData <- matrix(rnorm(30 * 8), nrow=30,
                           dimnames=list(genes, paste0("train", 1:8)))
testExprData <- matrix(rnorm(30 * 3), nrow=30,
                       dimnames=list(genes, paste0("test", 1:3)))
trainingPtype <- matrix(rnorm(8), ncol=1,
                        dimnames=list(colnames(trainingExprData), "drug1"))
predictions <- calcPhenotype(trainingExprData, trainingPtype, testExprData,
                             batchCorrect="none",
                             powerTransformPhenotype=FALSE,
                             removeLowVaryingGenes=0,
                             minNumSamples=0,
                             selection=1,
                             printOutput=FALSE,
                             pcr=FALSE,
                             removeLowVaringGenesFrom="rawData")
head(predictions)

oncoPredict documentation built on June 29, 2026, 5:07 p.m.