fit_svmc: Hidden genome SVM classifier (svmc)

View source: R/fit_predict_svmc.R

fit_svmcR Documentation

Hidden genome SVM classifier (svmc)

Description

Hidden genome SVM classifier (svmc)

Usage

fit_svmc(
  X,
  Y,
  backend = "liquidSVM",
  scale = TRUE,
  scale_fn = function(x) 2 * sd(x),
  ...
)

fit_svm(
  X,
  Y,
  backend = "liquidSVM",
  scale = TRUE,
  scale_fn = function(x) 2 * sd(x),
  ...
)

Arguments

X

data design matrix with observations across rows and predictors across columns. For a typical hidden genome classifier each row represents a tumor and the columns represent (possibly normalized by some functions of the total mutation burden in tumors) binary 1-0 presence/absence indicators of raw variants, counts of mutations at specific genes and counts of mutations corresponding to specific mutation signatures etc.

Y

character vector or factor denoting the cancer type of tumors whose mutation profiles are listed across the rows of X.

backend

the backend to use. Either "e1071" or "liquidSVM". Defaults to "liquidSVM". NOTE: these packages are required to be installed separately.

...

additional arguments passed to e1071:tune.svm, or liquidSVM::svm.

Details

Light wrapper around e1071::svm or liquidSVM::mcSVM to use in hidden genome classification

Examples

data("impact")
top_v <- variant_screen_mi(
  maf = impact,
  variant_col = "Variant",
  cancer_col = "CANCER_SITE",
  sample_id_col = "patient_id",
  mi_rank_thresh = 50,
  return_prob_mi = FALSE
)
var_design <- extract_design(
  maf = impact,
  variant_col = "Variant",
  sample_id_col = "patient_id",
  variant_subset = top_v
)

canc_resp <- extract_cancer_response(
  maf = impact,
  cancer_col = "CANCER_SITE",
  sample_id_col = "patient_id"
)
pid <- names(canc_resp)
# create five stratified random folds
# based on the response cancer categories
set.seed(42)
folds <- data.table::data.table(
  resp = canc_resp
)[,
  foldid := sample(rep(1:5, length.out = .N)),
  by = resp
]$foldid

# 80%-20% stratified separation of training and
# test set tumors
idx_train <- pid[folds != 5]
idx_test <- pid[folds == 5]

## Not run: 
# train a classifier on the training set
# using only variants (will have low accuracy
# -- no meta-feature information used)
fit0 <- fit_svmc(
  X = var_design[idx_train, ],
  Y = canc_resp[idx_train]
)

pred0 <- predict_svmc(
  fit = fit0,
  Xnew = var_design[idx_test, ]
)

## End(Not run)


c7rishi/hidgenclassifier documentation built on June 14, 2024, 11:10 a.m.