prog.pop.selection: Determine populations with prognostic value

View source: R/prog.pop.selection.R

prog.pop.selectionR Documentation

Determine populations with prognostic value

Description

Determine, by using different machine learning approaches, those populations with prognostic value... either with raw percentage (continuous variable) or under cutoff.types (categorical variable).

Usage

prog.pop.selection(
  fcs.SCE,
  assay.i = "normalized",
  cell.clusters,
  variables,
  cutoff.type = "maxstat",
  time.var,
  event.var,
  condition.col,
  cell.value = "percentage",
  method,
  method.params,
  plot = T,
  return.ML.object = F,
  train.index
)

Arguments

fcs.SCE

A fcs.SCE object generated through FlowCT::fcs.SCE().

assay.i

Name of matrix stored in the fcs.SCE object from which calculate correlation. Default = "normalized".

cell.clusters

Name of column containing clusters identified through FlowCT::clustering.flow().

variables

Vector with variables for calculating the prognostic relevance. If nothing is detailed (default), all immune populations from cell.clusters will be considered.

cutoff.type

Method for calculating survival cutoff.types. Available methods are "maxstat" (default), "ROC", "quantiles" (i.e., terciles) and "median". If "none" is especified, raw percentages (or counts) were used instead of categorical variables.

time.var

Survival time variable.

event.var

Variable with event censoring. Important note: positive and negative events should be coded as 1 and 0, respectively.

condition.col

Variable with differential condition (only needed if method = "biosign").

cell.value

String specifying if final resuls should be proportions ("percentage", default) or raw counts ("counts").

method

Machine learning approaches available for variable selection. Possible values are: "biosign", "random_forest" and "survboost".

method.params

Internal options for "tunning" the selected method, see each package's help for more information and default values.

plot

Whether results should be plotted. Default = TRUE.

return.ML.object

Logical indicating if the machine learning object must be returned (for later predicts). Default = FALSE.

train.index

Vector (based on "filename" variable) with samples selected as training dataset (needed for later predicts).

Details

Up to now, this wrapper function is comprising three different methods. Please, check each package's help for further details.

  • biosigner. It includes three classification algorithms: PLS-DA, RF and SVM; it only works with censoring event variable (but not with survival time).

  • randomForestSRC. Random Forests for survival (and regression and classification).

  • SurvBoost. A high dimensional variable selection method for stratified proportional hazards model.

The returning object changes according chosen arguments:

  • if return.ML.object = FALSE, only variables' importance/coefficients will be showed;

  • if return.ML.object = TRUE and train.index is empty, the object (list-type) includes the machine learning object and a double data.frame with variables' importance/coefficients; and

  • if return.ML.object = TRUE and train.index contains a vector of samples, the list-type object would store also a double data.frame with training and validation (test) datasets (with percentage or categorized data).

Important note: for using SurvBoost's method, you MUST to load it BEFORE FlowCT for avoiding internal conflicts.

Examples

## Not run: 
# eg1: only return more implied populations, after cutoff calculation
ml1 <- prog.pop.selection(fcs.SCE = fcs, cell.clusters = "SOM_named",
          time.var = "PFS", event.var = "PFS_c", cutoff.type = "quantiles",
          method = "survboost", method.params = list(rate = 0.4))

# eg2: apply predict (with training and validation datasets, 70%/30%), no cutoffs
train_idx <- sample(length(fcs$patient_id), length(fcs$patient_id)*0.7)
ml2 <- prog.pop.selection(fcs.SCE = fcs, cell.clusters = "SOM_named",
          time.var = "PFS", event.var = "PFS_c", cutoff.type = "none",
          method = "random_forest", train.index = train_idx, return.ML.object = T)
ml2_pr <- predict(ml2$ML.object, newdata = ml2$survival.data$test)

biosigner::predict(ml2$ML.object, #predict for biosigner
           newdata = ml2$survival.data$test[,-c(1:2,32)]) #delete survival and condition cols

SurvBoost::predict.boosting(ml2$ML.object, newdata = ml2$survival.data$test) #survboost

## End(Not run)

jgarces02/FlowCT documentation built on March 28, 2023, 12:42 p.m.