predict_smlc: Prediction based on the hidden genome sparse multinomial...

View source: R/fit_predict_smlc.R

predict_smlcR Documentation

Prediction based on the hidden genome sparse multinomial logistic classifier

Description

Prediction based on the hidden genome sparse multinomial logistic classifier

Usage

predict_smlc(
  fit,
  Xnew,
  Ynew = NULL,
  return_lin_pred = FALSE,
  normalize_rows = NULL,
  ...
)

predict_mlogit(
  fit,
  Xnew,
  Ynew = NULL,
  return_lin_pred = FALSE,
  normalize_rows = NULL,
  ...
)

Arguments

fit

fitted hidden genome mlogit classifier, an output of fit_smlc.

Xnew

test data design (or meta-design) matrix (observations across rows and variables predictors/features across columns) for which predictions are to be made from a fitted model. For a typical hidden genome classifier this will be a matrix whose rows correspond to the test set tumors, and columns correspond to (normalized by some functions of the total mutation burdens in tumors) binary 1-0 presence/absence of raw variants, counts of mutations at specific genes and counts of mutations corresponding to specific mutation signatures etc.

Ynew

the actual cancer categories for the test samples. This is not used in computation, but is return as a component in the output, for possibly easier post-processing.

normalize_rows

vector of the same length as nrow(Xnew) to be used to normalize the rows of Xnew. If NULL (default), no normalization is performed.

Value

a list with entries (a) probs_predicted: a ncol(Xnew) by n_cancer (determined from fit) matrix of multinomial probabilities, providing the predicted probability of each sample unit in Xnew being classified into each cancer site, and (b) predicted : a character vector listing hard classes based on the predicted multinomial probabilities (obtained by assigning individuals to the classes with the highest predicted probabilities), and optionally, (c) observed: if Ynew is supplied, then it is returned as is.

Note

Predictors in Xnew that are not present in the training set design matrix (stored in fit) are dropped, and predictors not included in Xnew but present in training set design matrix are all assumed to have zero values. This is convenient for a typical hidden genome classifier where most predictors are (some normalized versions of) counts (e.g. for gene and mutation signatures) or binary presence/absence indicators (e.g., for raw variants) so that a zero predictor value essentially indicates some form of "absence". However, care must be taken for predictors whose 0 values do not indicate absence.

See Also

fit_mlogit


c7rishi/hidgenclassifier documentation built on June 14, 2024, 11:10 a.m.