View source: R/fit_predict_nnc.R
fit_nnc | R Documentation |
This function first splits the data into a training and validation set and tunes hyperparameters using Bayesian optimization (similar to the approach used in Jiao et al. 2020), then uses the best hyperparameters to train a model on the entire dataset.
fit_nnc(
X,
Y,
val_split = 1/3,
trials = 200,
epochs = 50,
batch_size = 128,
verbose_mbo = T,
seed = 1
)
fit_nn(
X,
Y,
val_split = 1/3,
trials = 200,
epochs = 50,
batch_size = 128,
verbose_mbo = T,
seed = 1
)
X |
data design matrix with observations across rows and predictors across columns. For a typical hidden genome classifier each row represents a tumor and the columns represent (possibly normalized by some functions of the total mutation burden in tumors) binary 1-0 presence/absence indicators of raw variants, counts of mutations at specific genes and counts of mutations corresponding to specific mutation signatures etc. |
Y |
character vector or factor denoting the cancer type of tumors whose
mutation profiles are listed across the rows of |
val_split |
Fraction of data to be used as validation set for hyperparameters |
trials |
Number of trials for hyperparameter tuning |
epochs |
Number of training epochs |
verbose_mbo |
Bayesian optimization verbosity mode (logical) |
seed |
Random seed |
... |
Unused |
Object of class "nn", a named list of length 7 with the components of the neural network training process
X |
Input matrix |
Y |
Response vector |
map_df |
Dataframe with columns "original" and "numeric". The "original" column contains the original class names in Y and the "numeric" column contains the numeric representation of the classes used during training |
model |
Final Keras model trained on X and Y (see https://keras.rstudio.com/articles/about_keras_models.html for more details) |
ind_val |
Vector of indices of X corresponding to validation set used to tune hyperparameters |
tuning_results |
Named list with the results from the hyperparameter search (output of mbo() from mlrMBO). The list elements include "x", a named list with the best hyperparameters found, and "y", the validation accuracy corresponding to the best hyperparameters. See description of MBOSingleObjResult from mlrMBO for more details. |
preproc |
Named list with the parameters of the min-max pre-processing transformation applied to X prior to training (output of preProcess() from caret) |
The function uses packages keras and tensorflow for fitting neurual networks, which requires a python environment in the backend. See the installation notes for the keras R package for more details.
In addition to keras and tensorflow the function makes use of several functions from packages caret, mlrMBO, lhs, ParamHelpers, smoof, and mlr under the hood. These packages must be installed separately before using fit_nnc.
Zoe Guan. Email: guanZ@mskcc.org
Jiao W, Atwal G, Polak P, Karlic R, Cuppen E, Danyi A, De Ridder J, van Herpen C, Lolkema MP, Steeghs N, Getz G. A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns. Nature communications. 2020 Feb 5;11(1):1-2.
data("impact")
top_v <- variant_screen_mi(
maf = impact,
variant_col = "Variant",
cancer_col = "CANCER_SITE",
sample_id_col = "patient_id",
mi_rank_thresh = 50,
return_prob_mi = FALSE
)
var_design <- extract_design(
maf = impact,
variant_col = "Variant",
sample_id_col = "patient_id",
variant_subset = top_v
)
canc_resp <- extract_cancer_response(
maf = impact,
cancer_col = "CANCER_SITE",
sample_id_col = "patient_id"
)
pid <- names(canc_resp)
# create five stratified random folds
# based on the response cancer categories
set.seed(42)
folds <- data.table::data.table(
resp = canc_resp
)[,
foldid := sample(rep(1:5, length.out = .N)),
by = resp
]$foldid
# 80%-20% stratified separation of training and
# test set tumors
idx_train <- pid[folds != 5]
idx_test <- pid[folds == 5]
## Not run:
# train a classifier on the training set
# using only variants (will have low accuracy
# -- no meta-feature information used
fit0 <- fit_nnc(
X = var_design[idx_train, ],
Y = canc_resp[idx_train],
trials = 10,
epochs = 5
)
pred0 <- predict_nnc(
fit = fit0,
Xnew = var_design[idx_test, ]
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.